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We develop two new estimators for a general class of stationary GARCH models with possibly 
heavy tailed asymmetrically distributed errors, covering processes with symmetric and asymmet¬ 
ric feedback like GARCH, Asymmetric GARCH, VGARCH and Quadratic GARCH. The first 
estimator arises from negligibly trimming QML criterion equations according to error extremes. 
The second imbeds negligibly transformed errors into QML score equations for a Method of 
Moments estimator. In this case, we exploit a sub-class of redescending transforms that includes 
tail-trimming and functions popular in the robust estimation literature, and we re-center the 
transformed errors to minimize small sample bias. The negligible transforms allow both identifi¬ 
cation of the true parameter and asymptotic normality. We present a consistent estimator of the 
covariance matrix that permits classic inference without knowledge of the rate of convergence. 
A simulation study shows both of our estimators trump existing ones for sharpness and approx¬ 
imate normality including QML, Log-LAD, and two types of non-Gaussian QML (Laplace and 
Power-Law). Finally, we apply the tail-trimmed QML estimator to financial data. 
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1. Introduction 

It is now widely accepted that log-returns of many macroeconomic and financial time 
series are heavy tailed, exhibit clustering of large values, and are asymmetrically dis¬ 
tributed. In broader contexts extremes are encountered in actuarial, meteorological, and 
telecommunication network data (e.g., Leadbetter et al. [38], Embrehts et al. [21], Davis 
[17]), while GARCH-type clustering alone implies higher moments do not exist due to 
Pareto-like distribution tails (e.g., Basrak et al. [4], Liu [42]). 

We develop new methods of robust estimation for a general class of GARCH(1,1) 
models: 

yt = atet with af = >0 a.s., (1) 

where g{y, tr^, 9) is a known mapping g-.M.x [0, oo) x 0 —[0, oo) and 0 is a compact subset 
of for some finite q>l. We assume there exists a unique point 0° in the interior of 
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0 such that Ct = ytl<^t is i-i.d. with a non-degenerate absolutely continuous distribution 
with support (— 00 , 00 ), E[et] =0 and E[ef] = 1. Further, {j/t,(Tt} are stationary and 
geometrically /3-mixing. We avoid well known boundary problems by assuming 0*^ lies 
in the interior of 0 and has a non-degenerate distribution, hence ( 1 ) is a non-trivial 
GARCH process. In Bollerslev’s [7] classic GARCH model 

with w® > 0 and , /3° > 0 this requires a® -I- /3° > 0, cf. Andrews [3] and Francq and 
Zakoi'an [24]. 

In order to keep technical arguments brief, we assume Ut (0) := g{yt-i,(Jt-i{0),0) has 
properties similar to a non-trivial classic GARGH model: < 7 ^( 0 ) is twice continuously dif¬ 
ferentiable, i 3 [(supgg 0 \a^ /af {9)\)P] < 00 for any p > 0, and sup^g^^^ ||(9/90)*ln(tTj (0))|| 
is L 2 +t-bounded for tiny 6 > 0 and some compact A/q C 0 containing 0 °, where || • || is the 
matrix norm (cf. Francq and Zakoi’an [24]). Similarly, we impose Lipschitz type bounds 
on g that ensure an iterated approximation /iq(0) = uj and ht{9) = g{yt-i,ht-i{0),9) 
for t = 1 , 2 ,... satisfies sup^ge l^t( 0 ) — ^0 as t —>■ 00 , a key property for feasi¬ 

ble estimation (see Nelson [48], Francq and Zako'ian [24], Straumann and Mikosch [60]). 
The above properties of cr^(0) cover at least Threshold GARGH with a known thresh¬ 
old, Asymmetric and Nonlinear Asymmetric GARGH, VGARCH, GJR-GARGH, Smooth 
Transition GARCH, and Quadratic GARCH. Consult Engle and Ng [22], Carrasco and 
Chen [12], Francq and Zako'ian [24, 25] and Meitz and Saikkonen [44, 45]. EGARCH 
evidently is not included here since it is unknown whether supgg 0 \ ht{9) — cr^(0)j -^0 as 
t —?► 00 (see Straumann and Mikosch [60], Meitz and Saikkonen [44, 45]). 

We are interested in heavy tailed errors or innovation outliers, in particular we allow 
E[e^] = 00 , while GARCH feedback itself may also prompt heavy tails in yt due to a 
stochastic recurrence structure (Basrak et al. [4], Liu [42]). In this paper, we negligibly 
transform QML loss or score equations to obtain asymptotically normal estimators of 0° 
allowing for E[e^] = 00 . 

Define £*(0) := ytlcrt{9) and s^{9) := (9/50)Iiictj (0), and let /(•) denote the indica¬ 
tor function. In Section 2, we tackle the fact that (t/( 0) is not observed for t < 0. The 
first method trims QML criterion equations pt(0) ln(cr/(0)) -I- £^(0) according to ex¬ 
tremes that arise in a first order expansion and therefore the score ~ 1 )®?(^)- 

Since sf{9) has an L 2 -bounded envelope near 0° it suffices to minimize — 

£(( 0 ) — 1 < u) for some positive thresholds {l,u} that increase with the sample size n. 
Identification of 0° coupled with asymptotic normality are assured if {l,u} are replaced 
with intermediate order statistics of £4 (0) — I. The result is the Quasi-Maximum Tail- 
Trimmed Estimator (QMTTL), similar to the least tail-trimmed squares estimator for 
autoregressions in Hill [31]. 

The second method imbeds negligibly transformed errors in QML score equations 
(£((0) — l)St(0). We then re-center the transformed errors to minimize small sample 
bias and estimate 0° by the Method of Negligibly Weighted Moments (MNWM). By 
re-centering we may simply transform £t( 0 ) itself symmetrically which requires only one 
threshold, for example in the simple trimming case we use e^{9)I{\et{9)\ < c) for some 
c > 0. In order to simplify proofs we focus on simple trimming, and related bounded but 
smooth weighted redescending transforms €^{9)zu{€‘f{9),c)I{\et{9) \ < c) where w{-,c) is 
continuously differentiable in c, and w{e^{9),c) —^ 1 a.s. as c— 00 . Weights related to 
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simple indicators include Hampel’s three-part function, and smooth transforms include 
Tukey’s bisquare and an exponential version (cf. Andrews et al. [1], Hampel et al. [27]). 
See Sections 2 and 3. 

We show how trimming and distribution tail parameters impact efficiency, while the 
negligible amount of trimming never affects the asymptotic covariance matrix when 
E[e^] < oo. Fixed quantile trimming or truncation always impact efficiency irrespective 
of higher moments, and cause bias due to — 1 having an asymmetric distribution in 
general (Sakata and White [58], Mancini et al. [43]). Mancini et al. [43] use simulation 
based methods to solve the bias, but this requires knowledge of the error distribution 
(see also Cantoni and Ronchetti [11], Ronchetti and Trojani [57]). 

The convergence rate of our estimators is o{\/n) when E[e^] = oo, but can be assured 
to be \/njgn for any sequence of positive numbers {gn} that satisfies —?► oo as slowly 
as we choose by following simple rules of thumb for choosing the threshold c. Thus when 
E[e^] = oo our estimators converge faster than QML (cf. Hall and Yao [26]) but slower 
than i/n-convergent estimators in Peng and Yao [52], Berkes and Horvath [5] and Zhu 
and Ling [61], although the latter two are not for standard GARCH models in which 
E[e^] = 1 identifies the volatility process. See below for literature details. We do not 
tackle optimal threshold selection in order to conserve space. We do, however, show 
explicitly how threshold selection impacts the convergence rate which suggests simple 
rules for trimming. We also discuss practical considerations for trimming in terms of 
small sample bias control. See Sections 2.3 and 2.4. 

In Section 4, we show classic inference applies as long as self-normalization is used, 
a nice convenience since tail thickness and the precise rate of convergence need never 
be known. We complete the paper with simulation and empirical studies in Sections 5 
and 6. In particular, we give evidently the first comparison of various heavy tail robust 
estimators for GARCH models, and show our estimators obtain in general lower bias and 
are closer to normally distributed in small samples and therefore lead to better inference. 

A complete theory of QML for a variety of strong-GARCH models is presented in Lee 
and Hansen [39], Berkes et al. [6], Francq and Zakoi'an [24], Straumann and Mikosch 
[60] and Meitz and Saikkonen [45] amongst others, while at least a finite fourth moment 
E[ef] < oo is standard. The allowance of heavier tails E[ef] = oo, with Gaussian asymp¬ 
totics, evidently only exists for the classic GARCH model, and in most cases requires 
a non-Gaussian QML criterion and non-standard moment conditions to ensure Fischer 
consistency (i.e., consistency for the true parameter 0°). Peng and Yao [52] propose ^/n- 
convergent Log-LAD, requiring Ine^ to have a zero median in order to identify 9^. Berkes 
and Horvath [5] characterize a general QML criterion class that potentially allows for Fis¬ 
cher consistency, i/n-convergence and asymptotic normality even when E\ef\ = oo. They 
treat Gaussian QML, and various non-Gaussian QML like Laplace QML which requires 
E\et \ = 1 and E[ef] < oo, and Power-Law QML (PQML) with index > 1 requiring that 
Ct have an infinitessimal moment and £'[jetj/(l -I- jet])] = 1/z?. Student’s t-QML is Fischer 
consistent when ct is t-distributed, and otherwise may only be consistent for some 9^9^ 
(cf. Newey and Steigerwald [49], Sakata and White [58], Fan et al. [23]). 

Zhu and Ling [61] combine Berkes and Horvath [5] Laplace class with Ling’s [41] weight¬ 
ing method for Weighted Laplace QML (WLQML) under the assumptions et has a zero 
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median, E\et\ = 1 and E[e^] < oo. The estimator is y^-convergent and asymptotically 
normal when E[ej] = oo, but the suggested weights at time t are based on the infinite 
past yt-i,yt- 2 , ■ ■ ■ ■ Although the authors use a central order statistic for a threshold and 
fix = 0 for t < 0 in the weights for the sake of simulations, they do not prove either 
is valid. Indeed, for a GARCH(1,1) the restriction yt = 0 for t < 0 in their weight (2.4) 
does not support asymptotic normality (see Zhu and Ling [61], Assumption 2.4 and the 
discussion on weight (2.4)). Thus, the estimator is not evidently feasible. 

Assumptions like E\et\ = 1 or £i[|et|/(l + jet])] = l/d replace the usual E[e1] = 1 to 
identify 0°. Of course, if E[e1] ^ 1 then model (1) is not a standard GARCH model since 
E[y^\yt-i,yt- 2 , ■■■]^ ''^ith positive probability is possible, and Gaussian QML leads 
to asymptotic bias. Thus, asymptotic normality and Fischer consistency are assured 
precisely by changing the criterion and model assumptions and therefore the model 
by imposing a non-standard moment condition. In practice, this may be untenable as 
many analysts in economics and finance first impose a version of (1) with E\e1] = 1 and 
then seek a robust estimator. In order to sidestep such unpleasant moment conditions. 
Fan et al. [23] introduce a three-step non-Gaussian QML method. In the first stage, 
Gaussian QML residuals are generated. In a second stage, a scale parameter is estimated 
to ensure identification in the third non-Gaussian QML stage without imposing non¬ 
standard moment conditions. See also Newey and Steigerwald [49]. Our QMTTL and 
MNWM estimators are computed in one-step and are asymptotically normal and Fischer 
consistent by imposing negligible weighting on extremes couched in a Gaussian QML 
criterion. 

Evidently simulation experiments demonstrating the robustness properties of Peng and 
Yao’s [52] Log-LAD, Berkes and Horvath’s [5] non-Gaussian QML and Zhu and Ling’s 
[61] WLQML does not exist, while Fan et al. [23] only inspect the root-mean squared 
error of their estimator which masks possible bias. In general, the empirical bias and 
approximate normality properties of these estimators, as well as their ability to gain 
accurate inference in small samples (e.g., Wald tests), are unknown. 

In a simulation experiment, we show QMTTL and MNWM trump QML, Log-LAD, 
WLQML, and PQML in all cases in terms of bias, approximate normality and t-test per¬ 
formance, and has lower mean-squared-error than every estimator except PQML (PQML 
has higher bias and lower dispersion). Overall QMTTL performs best. The dominant per¬ 
formance of QMTTL and MNWM follows since only they directly counter the influence of 
large errors in small and large samples by trimming observations with an error extreme. 
We show this matters even when e* is Gaussian: negligible trimming always improves 
QML performance, while untrimmed QML, Log-LAD, WLQML and PQML are com¬ 
paratively more sensitive to large errors. Moreover, even PQML, which we design as in 
Berkes and Horvath [5] to ensure identification for Paretian errors with an infinite fourth 
moment, has greater bias and is farther from normality in small samples than QMTTL 
and MNWM. Thus, the advantages of non-Gaussian QML for GARCH processes with 
heavy tailed errors are not clear, at least as seen by our controlled experiments. We 
emphasize this last point by tail-trimming PQML in a way that removes adverse sample 
extremes and leaves the estimator asymptotically unbiased. We show in most cases tail 
trimming helps PQML in terms of bias, approximate normality and inference, yet overall 
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QMTTL is still better. Indeed, PQML is infeasible unless the tail index of Ct is known or 
estimated using some filtration for et (e.g., QML residuals), and is not Fischer consistent 
if et has any other distribution. 

In the literature on additive outlier robust estimation, negligible trimming is an exam¬ 
ple of a redescending transformation where in general "0(14) —>■ 0 as |it| —> oo, 

and typically ipiu) = 0 when |u| > c for some c as we use here. See Huber [34] and Hampel 
et al. [27]. Evidently a complete theory of redescending M-estimators exists only for esti¬ 
mates of location for i.i.d. data (Shevlyakov and Shurygin [59]). In this paper, our QML 
estimator has a score equation that effectively uses 'ip{et) = (e^ — !)/(—/ < e^ — 1 < u) 
where /, u —> oo as n —>■ oo. Our Method of Moments estimator is more generic since it 
uses either re-centered ipiet) = e(/(|et| < c) with c —>■ oo as n —>■ oo, or related variants 
like Hampel’s three-part weight, as well as smooth weights like Tukey’s bisquare. In all 
cases, the increasing thresholds ensure bias is eradicated asymptotically. 

We ignore additive or isolated outliers, and so-called one-off events in {yt} for the 
sake of brevity. In this case, we would observe yt = Vt + where is generated by 
(1) and, for example, Xt = 0 in most periods t. The challenge here is controlling the 
propagation of an aberrant observation due to Xt ^0 through the volatility mechanism. 
See, for example, Charles and Darne [14], Muler and Yohai [47], and Boudt et al. [ 8 ], and 
see Mendes [18] for anecdotal evidence of QML estimator bias. Incorporating additive 
outliers in ( 1 ) with innovation outliers would require additional robustness techniques 
like those employed in these and related papers (e.g., Muler et al. [46]). Some methods, 
however, are proposed to detect outliers in a GARCH process under the assumption of 
thin tailed errors: a few large values are simply assumed to be due to a non-heavy tailed 
outlier.^ Other estimators, contrary to claims, do not identify 9^ and/or are not robust 
to heavy tailed errors.^ Further, all such robust estimators are proposed for the classic 
GARCH model, hence existing theory does not necessarily extend to the broader model 
class ( 1 ). 

Finally, our methods can be easily extended to higher order GARCH models, GARCH- 
in-Mean, and models of the conditional mean and variance like nonlinear ARMA- 
GARCH, as well as other estimators like non-Gaussian QML (Berkes and Horvath [5], 

^Charles and Darne [14] extend ideas developed in (author?) to test for, and control, additive and 
innovation outliers in a GARCH process with Gaussian errors. These papers do not provide asymptotic 
theory, hence the Gaussian assumption can likely be relaxed. The trimming methods used in the present 
paper can be extended to their test statistics which involve a residual variance estimator (cf. Hill [31], 
Hill and Aguilar [33]), but a rigorous theory would need to be developed. 

^Muler and Yohai [47] present a robust M-estimator On = p{ln{y^ where 

h*^{9) is a filtered version of erf (9) that restricts the propagation of outliers. They assume p is thrice 
continuously differentiable with bounded derivatives. Although claimed to be heavy tail robust and 
identify the true 9^ (see their Theorem 3), they do not prove any such p exists. In their simulations, for 
example, they use truncated QML with p{u) = 'i/)c(exp{u} — u) where tpe truncates at a fixed threshold 
c: tpclx) = K tor all x > c. Thus p{u) is non-differentiable at exp{u} — u = c, and at all other points no 
derivative is bounded which implies non-robustness to heavy tails. The problem is the QML score is not 
bounded when p(u) is truncated according to its large values. Our approach, however, negligibly trims 
according to properties of the QML score and therefore ensures heavy tail robustness and identification 
of 9°. 
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Zhu and Ling [61], Fan et al. [23]), LAD (Peng and Yao [52]), etc. We show trimming 
matters for PQML in our simulation study, and we expect negligible trimming to improve 
upon non-Gaussian QML estimators in general, provided they are Fischer consistent in 
the first place. 

We use the following notation conventions. The indicator function /(•) is /(a) = 1 if a 
is true, and otherwise I{a) = 0. The spectral norm of matrix A is ||Aj| = Amax(A'A)^/^ 
with Aniax(‘) the maximum eigenvalue. If z is a scalar, we write ( 2 ;)+ := max{0,z}. K 
denotes a positive finite constant whose value may change from line to line; 6 > 0 is 
an arbitrarily tiny constant. and denote probability and distribution convergence. 
Xn ~ Ora implies Xnldn —>■ 1. L(n) is a slowly varying function that may change with the 
context. 


2. Quasi-maximum tail-trimmed likelihood 

The observed sample is {yt}t=o with sample size n + 1 > 1. We start at t = 0 to simplify 
notation since we condition on the first observation j/o and a volatility constant defined 
below. Estimation requires a volatility function on 0, 

hence af = It is convenient to assume 0 is a compact subset of points 9 on which 

CTj (0) is stationary: 


0 C {0 g R'?: {cTt (0)} has a stationary solution}. (2) 

In practice cr^ (0) for t < 0 is not observed, so define an iterated volatility approximation 

ho{9)=Cj>0 and ht{9) = g{yt-i,ht-i{9),9) for t = 1,2,..., (3) 

where lu is not necessarily an element of 0. We initially develop an infeasible robust 
estimator based on the QML equations In (0) + yll<7\ (9). We then show a feasible 
version based on In ht (0) +y‘l/ht (0) has the same limit distribution. 


2.1. Tail-trimming 

In order to understand when and where trimming should be applied, define the GARCH 
error function, and a scaled volatility function and its derivative 


^t{e) := 


yt 


yt 


at{9) g{yt_i,a^_^{9),9)’ 

I d 


s,(0) = ■■= 


d 
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Throughout, we drop 0° and write e* = et(0°), s* =St(0°), dt = 7)t{d°) and so on. Gaussian 
asymptotics for QML are grounded on the score equations mt{9) and their Jacobian 


mt{0) := {et{9) - l)5t{9) and 

a 

cm := -m,{9) = (e?(0) - 1)0,(0) - 

We assume St{9) and 5,(0) have L 2 +i-bounded envelopes near 9^ for tiny i > 0, thus 
asymptotic normality hinges entirely on ef — 1. See below for all assumptions. It therefore 
suffices to trim In tT^(6>) + e^{9) negligibly when el{9) — 1 surpasses a large negative or 
positive threshold. As long as those thresholds represent intermediate order statistics, we 
can identify 9^ and have an asymptotically normal estimator. Write 

f,( 0 ):=e?( 0 )-l, 

and denote left and right tail observations and their order statistics for £t{9): 

£[-\9) := £t{9)I{£t{9) < 0) and < • • • < (0) < 0, 

£^+\9) := £t{9)I{£ti9) > 0) and £l+\9) >■■■> £[+^9) > 0. 

The determination of the number of trimmed large £t(9) in a sample of size n is made 
by intermediate order sequences {/ci.nj ^ 2 ,ra}, hence (e.g., Leadbetter et al. [38]) 

G { 1 , ■ • ■ ,?T-- 1 }, ki^n^oo and ki^n/n^O. 

Define an indicator selection function for trimming 

ii^Ji9) ■.= I{£l;lj9)<£,{9)<£l+lj9)). 

The QMTTL estimator therefore solves 

9n = argmini - '^{\naf{9) + ef{9)) x ii^}{9) \ = argmin{(5„(6»)}. 
see [^7^ ) See 

Each ki^n represents the number of trimmed lncr((0) + €^{9) due to large negative or 
positive £ti9) = e, (0) — 1. We require ki^n oo for asymptotic normality, while negligi¬ 
bility ki^n/ri^ 0 ensures identification of 9^ asymptotically. Since £t(9) in general has 
an asymmetric distribution, identification of 9^ is assured asymptotically if we negligibly 
trim asymmetrically by £t{9). In a method of moments framework, however, we can re- 
centered trimmed errors allowing for symmetric trimming where negative and positive 
thresholds are the same: see Section 3. 

In practical terms, 9n can be easily computed using standard iterative optimiza¬ 
tion routines. In fact, under distribution continuity arguments developed in Cizek 
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[16], Lemma 2.1, page 29, apply for almost sure twice differentiability of the oth¬ 
erwise non-differentiable Qn{0)- In particular, we have almost surely {d/d9)Qn{0) = 
and {d/d9)‘^Qn{9) = l/nJ2t=iGt{9)I^J (9). This implies stan¬ 
dard estimation algorithms that exploit the gradient and Hessian apply. 

In order to characterize the limit distribution of we require non-random quan¬ 
tiles which the order statistics ^(9) and ^(9) approximate. Define sequences 
{£n(9),Un(9)} denoting the lower ki^nln and upper A: 2 ,„/n quantiles of £t{9): 

P{£ti9)<-Cni9))=’^ and p{£,(9) >U„m =—■ (5) 

n n 

The selection indicator is then 


ll^}{9) :=Ii-C^i9)<£t{9)<Uni9)). 

Notice £t{9) £ [—l,oo) and ki^njn —>■ 0 imply £„(0) —>■ I and W„(0) —>■ oo. The quantiles 
{£n{9), Un{9)} exist for each 9 and any choice of fractiles {fci,n,fc 2 ,n} since e* has a 
smooth distribution. By construction the order statistics {£^^^^^{9), estimate 

{£„(0),Z/f„(0)}, and are uniformly consistent in view of the /3-mixing condition detailed 
in Assumption 1 below, for example sup^ge ^^{9)/U„{9) — I| = Op{l/klG'j_ See Ap¬ 

pendix A.3 for supporting limit theory. 

Finally, define equation variances and iS„, and a scale Vn for standardizing 


En:=E[£^li^J]xE[5t5't] and := E 
Vn = [V.,j,n]l^, ■.= nE[BtBi]S-^E[BtS',]^ 




(£) 


E[£?I, 


j^Elsts't]. 


The scale form Vn = 'nE[5t5'i\S^ ^E[5tB'f\ is standard for M-estimators. In view of identifi¬ 
cation Assumption 2 and equation ( 6 ), below, and independence it is easily verified that 
the long-run variance satisfies = S„(l -|- o(l)). Thus V„ ~ n{E[e^ll^J] — l)“^i3[stS(], 
which is positive definite for our data generating process. 


2.2. Main results 

We require two assumptions concerning the error distribution, properties of the volatility 
response 5 , and parameter identification. Let n denote the moment supremum of tt- 

K := argsup{^ > 0 : Ajetj^ < 00 } > 2 . 

Assumption 1 (Data generating process). 

(a) There exists a unique point 9^ = [w*’, Qf*^,/3°]' in the interior of a compact subset 0 
ofW‘ such that et=yt/crt is i.i.d., i3[et] = 0 and E[e^] = l. 
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(b) Ct has an absolutely continuous, non-degenerate, and uniformly bounded distribu¬ 
tion on (— 00 , 00 ) : sup^gjgj(5/9a)P(et < a)} < 00 . If E[ef] = 00 then P{\et\ > a) = 
da~'^(l + 0 ( 1 )), where d> 0 and kG (2,4]. 

(c) is twice continuously differentiable in 9; [d/d9yg[-,■ ,9) is for each 9 G 
0 and i = 0,1,2 Borel measurable; P[supgg 0 |cr^/cr^(0)|^] < 00 for any p > 0; 
P[supggj^jj 11(9/90)*ln(CT^(0))|p+''] < 00 for i = 1 , 2 , tiny 6 > 0 , and some compact 
A/q C 0 containing 9^ and having positive Lebesgue measure. 

(d) {yt} and {cr/(0)} for 9 G O are stationary and geometrically fd-mixing. 

Remark 1. The tail index k in (b) is identically the moment supremum (see Resnick 
[55]). The volatility moment bounds in (c) imply only the tails of Ct matter for Gaussian 
asymptotics, and can be relaxed at the expense of added notation for trimming also 
according to St. Verification of (c) for the classic GARCH model is in Francq and Zakoian 
[24] , and related proofs for asymmetric models are in Francq and Zakoian [25] . 

Remark 2. Geometric /3-mixing (d) implies mixing in the ergodic sense, hence ergod- 
icity (see Petersen [53]). Lipschitz type conditions on the volatility response g combined 
with a smooth bounded distribution for ej suffice, covering a large variety of models 
(Carrasco and Chen [12], Straumann and Mikosch [60], Meitz and Saikkonen [44], Meitz 
and Saikkonen [45]). See Theorem 2.3 below for one such set of conditions. In the classic 
GARCH model yt = (JtCt and erf (0) =uj ay^_i + (3af_^{9), for example, where w > 0, 
a,/3 > 0 and i3[ln(a°ef + /3°)] < 0 ensure stationarity and ergodicity, and combined with 
E[e^] = 1 this allows for IGARCH and mildly explosive cases a® -1-/3° > 1 (Nelson [48]). If 
additionally Ct has a continuous distribution that is positive on (— 00 , 00 ) then {yt,a^{9)} 
are geometrically /3-mixing (Carrasco and Chen [12]). 

In the Appendices, we show 9^ obtains the expansion V}J‘^{9n — 0°) = n 
S"=ihence E\mtI^^J\ —>-0 must hold for asymptotic unbi¬ 
asedness of 0„. This reduces to assuming n^/‘^{E[Sfl^^])~^/‘^E[StI^J] 0 since by inde¬ 
pendence E[mtln}] = X E[st], while = £;[£://^^f] x E[sts't] and |[A'[sts[]|| £ 

( 0 ,oo). 

Assumption 2 (Identification). The fractile sequences A: 2 ,ra} satisfy x 

{E{£(t'n}\)~^''^E{£tl^^l\ -)> 0 where ft := ef - 1 . 

Remark 3. We do not require E[£tl!;^}] = 0 for finite n since our results are asymptotic, 
while E[£tln}] —>■ E[el — 1 ] = 0 automatically holds by dominated convergence and neg¬ 
ligibility = 0 ( 1 ). Since n^/^/(£'[ff])^/^ —>■ 00 as verified in Section 2.4 below, 

we require E[£tl')^}] —>■ 0 fast enough, else there is asymptotic bias. 

(C\ 

Remark 4- There always exists a sequence {ki^n,k 2 ,n} such that i3[ff// /] is closer 
to zero than (i3[ff])^/^/n^/^ as n increases. In general £t £ [—l,oo) is skewed right 
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hence, counterintuitively, asymptotic unbiasedness requires ki^n > k 2 ,n'- a few trimmed 
large positive values promotes asymptotic normality, but forces us to trim many negative 
values to ensure identification. See Section 2.3 for discussion and examples. In a method 
of moments framework, however, identification is assured by re-centering the trimmed 
errors, hence Assumption 2 is not required. See Section 3. 


/C\ 

Remark 5. Dehne mn,t '■= "m-tln t ■ Assumption 2 ensures — 

E[mn,t]}'] = E[mn,sm'^^t] + o(l|Sn|l/»^) for all s,t, and || YhZi < no x 

(||E„||/n) = o(||E„||) by Minkowski and Cauchy-Schwarz inequalities. Hence, E„ is 
asymptotically equal to the long-run covariance matrix Sn of 
since 


E 




( 6 ) 


i=l 


= X (1 -l-o(l)) +2^ [ 1 - - = E„ x (1 -|-o(l)). 


1 /2 

We are now ready to state the main results of this section. The expansion Vn {On — 
0 °) = (l + Op(l)) requires Jacobian consistency ^ / nY^^=iGt{0n) x 

i^}{9n) —E[sta't] and therefore consistency A 0° from first principles. Proofs of 
main results are contained in Appendices A.l and A.2. 

Theorem 2.1 (QMTTL consistency). Under Assumptions 1 and 2 ^ 0^. 


Theorem 2.2 (QMTTL normality). Under Assumptions 1 and 2 Vn^‘^{0n — 0^) 
N{0,lq) where Vn = nE[sts{]S~^E[sts{.] ~ n{E[£f Elsts'i] and each Vi^i^n —t oo. 

Now consider feasible QMTTL. Define et{0) '■=yl/ht{9) based on the iterated process 
{ht{9)} in (3), and ^t{9) '■= e^{9) — 1. The feasible estimator is 

9n = argminI - ^( 10 / 14 ( 0 ) -k e?(6l)) x /(^ J )(6l) <?t{9) < „)(0)) 

flee (^ 

Under the following Lipschitz bounds for the response g and its derivatives we show 9n 
has the same limit distribution as the infeasible , cf. Meitz and Saikkonen [44] . Related 
ideas are contained in Straumann and Mikosch [60]. 

Drop arguments: g = g{y,s,9), and let ga and ga^b denote first and second derivatives 
for a,b € {y,s,9}. We say a matrix function ^{y,s,9) is Lipschitz in s if ||C( 2 /,Si,d) — 
^(y, 32,0)1] < - S 2 I Vsi,S 2 € [0,oo) and y,0 e R x 0. 


Assumption 3 (Response bounds). 
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(a) 9 < ps + K{1 + y^) for some p € (0,1) and infygR_sgR^ .eee{|5l} > 0; 

(b) ll^all and ||go,&|| are bounded by K{l + y^ + s) for each a,b € {y,0}; 

(c) 9, Pa and gafi are Lipschitz in s, for each a,bG {y,s,0}. 

Assumption 3 ensures ht{0),h^{0) := {d/d0)ht{0) and h^'^{0) := {d/d0)hl{0) have sta¬ 
tionary ergodic solutions {hf{0),hf*{0),h^’^*{9)} with the geometric property 
^^[(sapege|at*( 6 l) -at( 6 l)|)"] = o(p‘) for each at{0) G {ht{0),hlt{9),hij^{0)} and <(6») G 

(0), hf ((0), h^’jlid)} and some p G (0,1). See Lemma A.7 in Appendix A.2. This leads 
to the next result. 

Theorem 2.3 (Feasible QMTTL). Under Assumptions 1-3 Vn‘^(0n — 0n) -^0. 

Remark 6. In the remainder of the paper, we focus on the infeasible for notational 
economy. 


As stated above, we need only trim by error extremes since first order asymptotics 
rests solely on whether has a fourth moment or not. However, in small samples a 
large yt-i may cause 5t or O* to spike and therefore the score equation to exhibit a 
sample extreme value. Consider, for example, that in the linear volatility model <jf(0) = 
uj + ay1_i +the score weight at the originSt( 0 )|Q_; 3 =o = x [l,y1_i,uj\' obtains 
an extreme value if and only if \yt-i\ does. In general St exhibits spikes when \yt-i\ does 
for and /3° near zero. This same properly applies to a large variety of GARCH models. 
Thus, although is consistent and asymptotically normal, for improved small sample 
performance trimming by large values of yt-i appears to be highly useful in practice. 
This is not surprising since true additive outliers render QML biased (see Mendes [18], 
Muler and Yohai [47], cf. Cavaliere and Georgiev [13], Muler et al. [46]). 

Let {kn} be an intermediate order sequence and define := I{\yt \ < yl~\) where 

are order statistics of 2 /^“^ := \yt\. The estimator in this case is 


0^y'> = argmimj - (Incr 2 ( 6 )) + e^{0)) x {0)ln}-i 


9ee 


t=i 


Since 1^1-1 ^ the score equations s* are square integrable, and Ct is i.i.d., asymptotic 

normality does not depend on whether yt is heavy tailed. Indeed, it is easy to show 0n'^ 
is asymptotically equivalent to The same property extends to feasible QMTTL with 
trimming by yt-l^ denoted 0n'^. We therefore omit the proof of the next result. 


Corollary 2.4. Under Assumptions 1 and 2, trimming by yt-i does not impact the 
limit distributions of infeasible and feasible QMTTL estimators: Vn'^{0n'^ — 0n) 0 and 

VrJ‘^{0^'’ — 0n) 0. Moreover, infeasible and feasible estimators are asymptotically equiv¬ 
alent: - &) 4 0. 
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2.3. Verification of identification Assumption 2 

We require an explicit model of P(|et| > c) in order to verify Assumption 2. In our 
simulation study, we use distributions with either power law or exponential tail decay. 


2.3.1. Paretian tails 

In the simulation experiment we use 


P(|et| > c) = (I+ c)“'‘ with kG (2,4), (7) 

hence £t has left and right tails: 

P{£t<-c) = P{el<l-c)=Q ifc>I, 

= I-P(e?>I-c) = l-(2-c)-" ifcG[0,I], ( 8 ) 

P{£^ > c) = P{el > 1 + c) = (2 + c)-"/ 2 . 

We show below identification n^/'^{E[£^I^^l])~^/'^E[£tI^^] 0 holds if —>• oo, 

ki^n/n —>■ 0 and: 


t-2,ra 

n 


l-2/re 


K — 2 


-1 + 


fcl.r 


1 - ki^ri/n 

2/K-1/2 ^ 


2/k 


2 

K — 2 n 


(9) 


In practice, (9) is greatly simplified asymptotically by noting = o(I) 

and (1 — ki^ri/n)~‘^/'^ — 1 ~ ( 2 /K)(fci^„/n), hence identification applies if ~ 

((2k - 2)/K)(fci,„/n) or 


jL / i\«;/(k-2) . 

A similar condition applies in the second order power law case P{\et\ > c) = dc~'^{l + 
ec~^) with d,e > 0, ^ > 0 and k G (2,4), while a less sharp result arises under P{\et\ > 
c) = dc~'^{l + o(I)). 

(c\ 

In order to show (9), we must characterize the moments E[£tln j] = E[£tl{~£n <£t< 
Lin)] and E[£^I^J]. Use (8) to deduce Un = (n/fc 2 ,„)^/'‘ — 2 —>• oo and £„ = 2 — (n/(n — 
kl,n)?/'" G [0,1] as n —>■ oo. Therefore, 


E[£tI{-Cn<£t<Un)] 

= -{E[£tI{£t>Un)] + E[£tl{£t < -£„)]} 

( 11 ) 
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-|y°°(2 + dw - J\l-{2- d^l 


K — 2\ n 


1-2/k 


+ 1 - 


n — ki > 


Ijn 


2 ^1,77 

K — 2 n 


Next E[£fl^J] ~ ^ follows from (15) below. Combined with (11) and by 

rearranging terms, Assumption 2 holds when E[£tl^^] = o{{E[£fI^^]Y/‘^/v}/'^), hence 
when 


fc2.ny At-2/ ^ / 1 y/" ^ 2 fci,„ \ 

n J 2 \ \ 1 — k — 2 n J 


+ o 


k2,n) 


( 12 ) 


Notice fc2,n appears on both sides of the equality. In order to achieve (9), note ^2,71/^1,71 —t 
0. This follows since = o(l) s-nd by the mean-value-theorem (1 — 

k\^nlri)~‘^l^ — 1 ~ (fljK)k\^nln hence (^2,77/11)^“^/'' ~ Kki^n/n, therefore 


^2,77 

kl.n 


1-2/k 


f fc2,77/n y (fcl,77/n) 

\kl,n/n) 


i^(fci,77/n)2/« ^ 0. 


Now combine ^ 2 , 77 /^ 1,77 —t 0 and (12) to deduce (9). 

There are several things to note from (10). First, there are arbitrarily many valid 
{^ 1 , 77 , ^ 2 , 71 }- Second, {^ 1 , 77 , ^ 2 , 77 } requires knowledge of k, which can be consistently esti¬ 
mated for many processes defined by (1) (see Hill [29]). However, the method of moments 
estimator in Section 3 only requires one two-tailed fractile without knowledge of k. 

Third, ^ 2 , 77 /^ 1,77 —t 0 since ki^n/n —t 0 and k > 2. This logically follows since £t has 
support [— 1 , 00 ). The right tail is heavier, hence trimming a positive extreme must be 
off-set by trimming more negative observations in order to get E[£tl!f /] ~ 0. 

Fourth, ki^n ~ njg\^n for slowly varying ^ 2,77 —t 00 implies fc 2,77 ~ rijg 2 ,n for slowly vary¬ 
ing g2,mg2,nlgi,n —t oo. Similarly, ki n ~ Xin^^ for Ai € (0, 1 ) and Si € { 21 k, 1) implies 
^ 2,77 ~ A 2 n '*2 for A 2 G (0,1) and 82 & (0,5i). Further, slowly varying ki^n —t 00 is not valid 
since fc 2,77 —t 0 is then required which leads to asymptotic non-normality when E[ef] = 00 . 

Fifth, we need monotonically larger ki^n as k \ 2, but always limsup„^ 3 o(fc 2 , 77 /^ 1 , 77 ) < 
1. Exponential tails treated in Section 2.3.2 reveals an extreme case: there are no limita¬ 
tions on how we set {^ 1 . 77 , ^ 2 , 77 } outside of an upper bound, although ki^n > ^ 2,77 always 
reduces small sample bias. 

Finally, as a numerical example suppose k = 2.5 and n = 100. If /c 2,77 = 1 then fci, 7 i = 33 
renders (10) a near equality, although any ki^n S {29,...,35} aligns with ^ 2,77 = 1 by 
rounding. This is striking: we need to trim roughly 33 times as many negative ft(0) as 
positive £t{d) to approach unbiasedness at n = 100. If n = 800 then, for example, k 2 ,n = 2 
aligns with roughly fci ,77 = 200 . 
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2.3.2. Exponential tails 

Now suppose Ct has a Laplace distribution: 

P{et <—c) = ^exp{—V2c} for c > 0 and P(et > c) = ^ exp{—\/2c} for c > 0. 

We use a normal distribution in our simulation study, but the exposition here is greatly 
simplified under Laplace, while the conclusions are the same. 

We have P{£t < —c) = 1 — exp{—•\/2(l — and P{£t > c) = exp{—•\/2(l + 

The following are then straightforward to verify: £„ = 1 — (ln(n/(n — and Un = 

(ln(0.5n/fe2,„))^ — 1, hence 


E[£tIi-Cn<£t<Hn)] 

n n J \n-ki^nj{ \n-ki^nj \ ^ J 

Observe E[£tl{—Cn < £t <£ln)\ ~ 0 when > k 2 ,n, hence if fci_„/n—0 then fc 2 ,„/n—>■ 
0 must hold. 

Since E[ef\ < oo we need E[£tl{—Cn <£t< ^n)] = o(l/n^/^). Notice ln(n/(n —^ 
ki^n/n. Hence if simply each ki^n = o(n^/^), then we achieve E[£tl{—Cn < £t < ^n)] = 
o(l/n^/^). This implies that technically we do not even need asymmetric trimming ki^n > 
k 2 ,n as long as we set ki n = ^ 2,71 = o(n^/^). This follows since tails are so thin that in 
general extremes on [0, 00 ) are not much larger than extremes on [—1,0) in small samples. 
Similarly, we can use any form of asymmetric trimming that satisfies ki^n = o(n^/^). 
We show by simulation that as n gets large, bias evaporates irrespective of but 
fci.n > ^ 2,71 always leads to lower small sample bias. 

2.3.3. Remarks 

We demonstrate by simulation in Section 5 that using = 10A:2,7i or ki^n = 35^2,n 
for either n G {100,800} and either Paretian or Gaussian ct leads to a superb QMTTL 
estimator. Indeed, simply using symmetric trimming = k 2 ^n still leads to a better 
estimator than Log-LAD and Weighted Laplace QML in terms of small sample bias and 
approximate normality, although Power-Law QML tends to have lower bias and be closer 
to normal. In general using bias minimizing fractiles, like A:i „ = 100^2,n for Paretian et 
when n = 800, is not evidently required for obtaining low bias in finite samples, as long 
as ki^n is comparatively large relative to k 2 ^n in which case QMTTL trumps Log-LAD, 
WLQML and PQML. 

We also find that our method of moments estimator in Section 3 dominates Log-LAD, 
WLQML and PQML, although QMTTL with = 35^2,n leads to smaller bias and is 

closer to normally distributed in nearly every case. Nevertheless, the method of moments 
estimator is always asymptotically unbiased and easier to implement because trimming 
is symmetric. Which estimator is chosen in practice depends on the analyst’s preferences: 
method of moments is guaranteed to be asymptotically unbiased, but QMTTL has su¬ 
perior small sample properties even if {fci_„,fc 2 ,„} are not chosen to ensure asymptotic 
unbiasedness. 
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The scale V„ and rate of convergence depend on the error tail index k > 2. If E[ej] < oo 
then by dominated convergence E[£fl^J] = E[{ef — E[{e^ — 1)^] = E[e^] — 1, 

thus Vn ~ n{E[ef] — [stfij], the classic QML asymptotic covariance matrix. This 

implies trimming does not affect efficiency asymptotically. Hence, we now assume E[ef] = 
oo. 

Let the intermediate order sequences {A:„} and positive thresholds {C„(0)} satisfy 

Pi\£t{9)\>Cnm = -- 

n 

The rate E[£f{9)I^J (0)] —oo is logically governed by the right tail of £t{9) = e?(0) — 1 S 
[—l,oo) since by dominated convergence: 

if[f2(0)j^)(0)] = E[£^{e)i{-Cr,{e) < £t{e)<Ur,{e))] 

^ E[£^,{e)i{\£m<Cu{e))] 

as though £t{d) were symmetrically trimmed with thresholds and fractile 

Cn{9)=Uni9) and fc„ = fe 2 ,„. (13) 

Note E[£f{9)I^J{9)\ E[£f{9)I{\£t{9)\ < C„(0))] is useful for characterizing the con¬ 
vergence rate, but identification Assumption 2 in general requires hence 

£„(0)<W„(0). 

As long as E[ef\ = oo, then the rate of convergence is = o(n^/^): heavy tailed 
errors can only adversely affect the convergence rate. The exact rate can be deduced by 
observing that from P{\et \ > a) = da~'^{l + o(l)) the variable £t = — 1 has a tail sum 

dominated by the right tail: 

P{\£t\>a)= P(e? >l + a) + P(e? < 1 - a) 

(14) 

= (i(l -I- a)“'‘/^(l -I- o(l)) = da~'^^'^(l -I- o(l)) as a —>■ oo. 

Hence, the thresholds C„ can always be chosen as C„ = (ffl'^(n/kn)^P. Now use an im¬ 
plication of Karamata’s theorem to obtain as n —>■ oo (e.g.. Resnick [55], Theorem 0.6):^ 

K = \-.E\£‘ll^^}\^d\n(n\ 

(15) 

n e {2A): E[£!li^J]^ (^j^yiP{\£t\>Cr.)= ' =o(n). 

^Note if K, = A then for finite a > 0 there exists K > 0 such that P(£t > u^^^)du = 

K + P{S^t > du ^ K + d du ~ + dln(C^) dln(n) since Cn = K{n/kn)^^'^. 
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The following claim summarizes the above details. 


Theorem 2.5 (Convergence rate). Under Assumptions 1 and 2 if k> A then Vn ^ 
n{E[ef] — . If k<A then for i = 1,... ,q 


1/2 


/ \ 

= 4 : viz. - 

^ 2/K-1/2 


n€{2A):VlA^n^ 


^-2/« 


4 — K 


1/2 




(16) 


There are several key observations. First, as long as k G (2,4) then elevating ar¬ 
bitrarily close to a fixed percent of n, that is ~ An for A G (1,0), will optimize the 
convergence rate. This is logical since large errors adversely affect efficiency. In general 
this implies 

kn n/gn for (/„—>■ oo at a slow rate, (17) 

ensures ^ for any k G (2,4]. Hence, —>■ oo can be driven as close 

to rate n^/^ as we choose by setting gn^oo very slowly (e.g., gn = ln(ln(n))). Further, 
the rate monotonically j as k ^ 4. Hall and Yao [26] show the QML 

rate is n^“^/'‘/T(n) for some slowly varying L(ri) —>■ oo and any n G (2,4], hence QMTTL 
can be assured to be faster for every n G (2,4). Conversely, Peng and Yao’s [52] Log-LAD 
and non-Gaussian QML are n^/^-convergent (cf. Berkes and Horvath [5], Zhu and Ling 
[61]), but the higher rate is not without costs: (i) these estimators are not robust to 
error extremes in small samples: see Section 5; (ii) Log-LAD requires Ine^ to have a zero 
median; and (iii) non-Gaussian QML requires additional moment conditions for Fischer 
consistency, for example, WLQML requires i^let] = 1: see Section 1 for discussion. 

Second, if k < 4 and we use a fractile form ^ Xnjg^ for slow (/„—>■ oo and A G (0,1], 

then 


^^(0„ - 0°) 4 Ar(o, 

= N{0,V{X,K,d)). 


(18) 


For example, in our simulation study we use kn An/ln(n), hence 0„ is j 
(ln(n))^/”“^/^-convergent with asymptotic variance V(A,K,d). The asymptotic variance 
V(A, K, d) can always by decreased by increasing A and therefore removing more extremes 
per sample. 

Third, in view of = ki,n by (13), trimming rule (17) only concerns the amount 
of trimmed positive observations of ft = — 1: the left tail of £t is bounded, hence 

only the rate of right tail trimming of £t matters for the convergence rate. In terms 
of identification, however, as discussed in Section 2.3 the number of trimmed left 
and right tail observations and fc 2 ,n must be balanced when ct is governed by 
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a heavy tailed distribution. For example, if P(|et| > c) = (1 + c) with k G (2,4), 
and ki^n ~ An/ln(n), both as in our simulation study, then Assumption 2 holds when 
k 2 ,n ~ ~ A:n/(ln(n))«/('‘-2), hence from (18) the rate of conver¬ 

gence is n^/^/((ln(n))”/^'^“^^)^/”“^/^ = n^/^/(ln(n))(^“'‘)/[^('^“^)l. 

As a practical matter, naturally too much trimming in any given sample can lead to 
small sample bias in In Section 5 , we use fc„ ^ An/ ln(n) with A = 0.025 for both very 
thin and thick tailed error distributions: values much larger than 0.025 (e.g., A = 0.10) 
leads to substantial bias, and values much smaller (e.g., A = 0.01) are not effective for 
rendering approximately normal in small samples. In general any value A G [0.02,0.05] 
leads to roughly the same results. Similar trimming schemes are found to be highly 
successful in other robust estimation and inference contexts: see Hill [30, 31] and Hill 
and Aguilar [33]. 

Last, there are several proposed methods in the robust statistics literature for select¬ 
ing trimming parameters like A, but in this literature the seeming universal approach 
for data transformations involve a fixed quantile threshold hence ^ An (cf. Huber 
[34], Hampel et al. [27], Jureckova and Sen [37]). Such methods include covariance de¬ 
terminant or asymptotic variance minimization where a unique internal solution for A 
exists. These methods are ill posed here since they lead to corner solutions: consider that 
minimizing V{X,K,d) above on A G [A, A] leads to A = A. See Hill and Aguilar [33] for 
references and simulation evidence. In terms of inference more choices exist, including 
test statistic functionals over A like the supremum, and empirical process techniques for 
p-value computation (see Hill [30]). 

3. Method of moments with re-centering 

Our second estimator uses the method of moments based on negligibly weighted errors 
imbedded in a QML score equation. This gives us the advantage of re-centering to ensure 
identification. It therefore allows us to use a greater variety of error transforms, as well 
as symmetric transforms even if the errors have an asymmetric distribution. Define := 
cr{yr'T< t). 

The class of transformations we consider have the general form 

ifiu, c) :=u X vj{u, c) X I{\u\ < c), (19) 

where is for each c a Borel function, and 

lim w{u,c) X I{\u\ < c) = 1. (20) 

C^OO 

Thus, 'ip{u,c) is a redescending function (see Andrews et al. [I] and Hampel et al. [27]). In 
the literature typically c is fixed, but the only way we can identify 6^ and obtain Fischer 
consistency without an additional simulation step is to enforce c —> oo as n —> oo.^ Notice 

■^See, for example, Sakata and White [58], Cantoni and Ronchetti [11] and Mancini et al. [43]. 
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as c —?► c» the transform satisfies 'ip{u,c) u hence it applies a negligible weight to u. 
Further, it operates similar to tail-trimming since by (20) 


'0('u,c) = it/dwl < c) X (1-|-o(l)) asc—>- 00 . 


( 21 ) 


We focus on two types of weights vj. First, the simple trimming case ip{u, c) = ul{\u\ < 
c), hence 

w(u, c) = 1 . 

The theory developed below easily extends to related redescending functions tp(u,c), like 
Hampel’s three-part trimming function with thresholds 0 < a < b < c (see Andrews et al. 
[ 1 ]): 

0 < |tt| < a, 
a < |it| < b, 

6 < |u| < c, 
c < |u|. 


{ M, 

a X sign(M), 

a X (c- |u|) . 

-X sign(M), 

0 , 

This can be identically written as (19) with 


w(u, c) = /(lul < a) -f r — r X I (a < \u\ < b) 
\u\ 


«(c- 1^1) 

\u\{c-b) 


xl{b<\u\<c). ( 22 ) 


Of course, we abuse notation since there are three thresholds {a, 6 , c}. By construction 
w{u, c) € [ 0 , 1 ], while negligibility requires the smallest threshold a —>■ oo, hence ii 7 (u, c) —>■ 
1 as a —>■ oo. 

Second, we use smooth weights w{u^ c) that are continuously differentiable in c, with 




X < c) < K-. 

c 


(23) 


Notice the simple trimming case iu(u,c) = 1 trivially satisfies (23). Thus, as c —>■ oo 
the transform derivative {d/dc)ip{u,c) —>■ 0 at rate 0(l/c) for all |u| ^ c. An exam¬ 
ple is Tukey’s bisquare vj{u,c) = (1 — {u/cYY with {d/dc)w{u,c) = 2(1 — (m/c)^)u^/c^ 
hence (23) holds. A second example is the exponential tu(u,c) = exp{— |m|/c} with 
{d/dc)w{u, c) = exp{— |m|/c}|m|/c^. 


Assumption 4 (Redescending transforms). Let'ip{u,c) satisfy (19), (20) and (23). 

Now define two-tailed observations £(“^(0) := |et(0)| and their order statistics e|“|(0) > 
6 ( 2 ) (^) — ■ ■ ■ 1 {kn} be an intermediate order sequence. Write 

i^l{e) :=/(|e,(0)|<e£)(0)), 

'0n.t(6') := V'(et(6»),e[“^)(6»)) =et(6») x ro(et(6l),e|“^)(6»))J,^")(6»), 
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and define re-centered equations and a Method of Negligibly-Weighted Moments 
(MNWM) estimator 




:=argmin( j 


Any positive definite symmetric weight matrix W S leads to the same solution 

argmingg 0 ^"^j^ fhn^tiQ)' x Wx Similarly, any St_i-measurable uniformly 

L 2 -i-t-bounded vector Zt{9) € R'', r > g, can be used instead of St{9) for a GMM estimator 
(Hansen [28]). The scaled volatility derivative St(0), however, provides an analogue to 
QML. Finally, as discussed in Section 2 small sample performance appears to be improved 
if we also trim by yt-i, while asymptotics are unchanged if trimming is negligible. The 
estimator in this case uses the transformed error et{0)'cu{et{O) , 

Next, for asymptotics let {C„(0)} satisfy 

Pi\etm>Cn{9)) = ^, 

n 

write compactly 

I^l{9) :=J(|e,(0)|<eg^(0)), 

:= fj{et{e),Cr,i9)) and e„,*( 0 ) := et{9)li^}{9), 
and define equations with non-random thresholds 

W := «ti0) - EbPlm) X i^ti9) - E[Btm)- 

In view of re-centering in rhn,t(9) it can be shown that, asymptotically, mn,t and rhn,t 
are interchangeable. See the Appendix. 

Since et is i.i.d. and has a smooth distribution, the transform is negligible in that 
f^n,t{9) it{9), and St is 3t_i-measurable, it follows for all n> N and some large 

NeN 

^^[mn,t(0)|Q't_i] = 0 if and only if 9 = 9°, 

hence an identification condition like Assumption 2 automatically holds. Similarly, by 
negligibility tp{u,c) = u/(|u| < c) x (1 -|- o(l)) as c—>■ oo and E[ef f] —>■ 1, hence by inde¬ 
pendence of the errors 

E[rhn,trh'^^t\ = - E[f}{et,Cnf]f) x E[{5t - £l[ 5 t])(st - A[st])'] 

= E{{el, - E[el,]f) x A[(s, - A[s*])(a, - A[s,])'] x (1 + o(l)) 

= {E[el,] - 1) X E[is, - E[B,]){Bt - A[s,])'] x (1 + o(l)). 
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The MNWM scale is therefore 

o 71 / 

Vn = 1 1 X E[{St - E[5t])i5t - £;[st])'], (24) 

which is positive definite under Assumption 1 . 

1/2 

Theorem 3.1 (MNWM). Under Assumptions 1 and 4 — 6^) N{0,lq). 

o o 

Further each Vi^i^n —>■ oo and —>■ (0,1). 

o 

Remark 7. In general a direct comparison of QMTTL and MNWM scales Vn and V„ is 
difficult for a particular n due to the different trimming strategies. Notice, however, that 
— 1 = E[£fI^J] X (1 + o(l)) if Cn = {Un + 1)^/^. This follows by noting E[ef\ = 1, 
G [—l,oo), negligibility and dominated convergence imply 

E[£^j‘^}] = E[{et - 2e? + !)/(-/:„ < e? - 1 < W„)] 

= E[etl{(l - £„)1/2 < |et| < (W„ + 1)1/^)] x (1 + o(l)) 

= <(W„ +1)1/2)] X (1 + 0(1)). 

o o 

Thus, Vn X V“i = E[{st — E[st]){st — A[st]'] x i?[sts(] as n —>■ oo. Therefore Vn is smaller 
than Vn due to the centered term E[{st — i?[st])(st — ^[st])'], hence identification is 
assured at a cost of efficiency. 

o 

Remark 8. Since Vn ^ E-nVn for some sequence of positive definite matrices {/C„}, the 
Section 2.4 discourse on the QMTTL rate of convergence carries over here. 


4. Inference 


In view of Vn ^ n{E[£fI^J]) ^E[sts[], a natural estimator of the QMTTL scale V„ is 


K=1>n(L) = nx 




1 " „ 

X - Vst(6 »„)s((6>„). (25) 

r? 


Theorem 4.1. Under Assumptions 1 and 2 Pn= Vn(l + Op(l)). 

Remark 9. Notice = V„(l +Op(l)) only reduces to t^n = V„ +Op(l) when E[e'l] < oo. 
In general classic inference is available without knowing the true rate of convergence, nor 
even if trimming is required. 

o 

Remark 10. A consistent estimator of the MNWM scale V„ can similarly be con¬ 
structed. 




Robust estimation and inference for heavy tailed GARCH 


21 


A Wald statistic naturally follows for a test of (non) linear parameter restrictions 
= 0 where and J > 1. Assume R is differentiable with a gradient 

2 ?( 0 ) = {d/d0)R{0) that is continuous, differentiable and has full column rank. The test 
statistic with the QMTTL estimator as a plug-in is 

= R{en)'{v{e^)f^-\e^)v{e^)')-^R{er.). 

Use Theorems 2.2 and 4.1 to deduce Wn -4- X^(J) under the null, and if R{9^) ^ 0 then 
yV„ A oo. 

Similarly, the proof of Theorem 2.1 shows the QMTTL first order condition is 1/n x 
S"=i i^ri) = 0 a.s. This naturally suggests the possibility of a score or Lagrange 

Multiplier test since a QMTTL estimator under the constraint R{9^) = 0, denoted 9n\ 
also satisfies i/'nJ27=i ''^t{9n^)I^t (^n^) ^ 0 if the constraint is true. A heavy tail robust 
test of R(9^) = 0 can therefore be coached as a tail-trimmed moment condition test as 
in Hill and Aguilar [33]. 


5. Simulation 

We now compare our robust QML and Method of Moments estimators with various 
estimators in the literature. In order to draw the best comparisons between QMTTL and 
MNWM, we initially focus on simple trimming for MNWM. We compare our estimators 
to QML as a benchmark, as well as Log-LAD, Weighted Laplace QML (WLQML) and 
Power-Law QML (PQML) due to their heavy tail robustness properties. Finally, we 
investigate other redescending transforms as alternatives for MNWM, and whether tail¬ 
trimming can improve the small sample properties of PQML. 


5.1. Data generation and estimators 

Let Pk denote a symmetric Pareto distribution: if et is distributed then P{et < —a) = 
P{et > a) = 0.5(1 -I- a)~'^ for a > 0. We draw 20n observations for n € {100,800} from 
the GARCH process yt = UtCt and CTj = 0.05 -I- 0 . 052 /^_i -I- with a starting value 

a\ = 0.05, and retain the last n observations for the sample. This is repeated to produce 
10,000 samples {yijtLi- Our choice of parameter values are indicative of values we obtain 
in the empirical study below, and frequently encountered in macroeconomic and financial 
data. The error tt is i-i-d. iV(0,1), or P 2.5 standardized such that E[ef\ = 1. 

We compute the feasible QMTTL and MNWM estimators conditional on the first 
observation, with parameter space is 0 = [i,2] x [i, 1 — t] x [i, 1 — t] where b = 10 “^°. 
The iterated volatility variable is hi{9) = uj and ht(9) =uj + ayt-i + Pht-i(9) where we 
initialize hi{9) =oj for QMTTL and h\{9) = [1,0,0]' for MNWM. 

As a benchmark for QMTTL we use strong asymmetric trimming with error frac- 
tiles fc 2 ,n = maxjl, [0.025n/ln(n)]} and fci,n = 35 /c 2 ,ra- This equates to {fci_„,fc 2 ,„} = 
{1,35} and {3,105} for n = 100 and 800. The fractile for trimming by yt-i is = 
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max{l, [ 0.1 ln(n)]}: asymptotics do not require such trimming, while removing a very few 
criterion equations due to large yt-i appears to improve the estimator’s performance. 
The benchmark for MNWM is simple trimming tp(u,c) =ul{\u\ < c). The error fractile 
is as above k„ = max{l, [0.025n/ln(n)]} and the fractile for trimming by yt-i is again 
kn • 

In addition to the benchmark estimates, we compute MNWM with Tukey’s bisquare 
and exponential transforms. We also compute QMTTL with weak asymmetric (fci,™ = 
10 fc2,ra) and symmetrie (fci,n = fc 2 ,n) trimming. Recall from Section 2.3 that for QMTTL 
fci,ri = 35/02,n roughly minimizes bias in the Pareto case P{\et\ > a) = (1 + when 

n = 100. We show here that using = 35fc2,n even when n = 800 still promotes a sharp 
estimator. In simulations not reported here, we find that the bias minimizing relation 
fci,„ = 100^2,n when P{\et \ > a) = (1 + and n = 800 logically leads to even smaller 

bias, but bias is still low when fci,„ = 35fc2,n- Recall also that any combination /o 2 ,n} 

works in the Gaussian case provided = o(n^/^). This is violated here since we use 
Kn/ \TL{n), however this matters only asymptotically, and we demonstrate that 
using fc 2 ,n Kn/ln{n) and fci,„ = 35fc2,n for n = 100 and 800 in the thin tail case still 
leads to a competitive estimator in small samples. Indeed, if we use ki^n ln(n) 

then the small sample performance is essentially identical to what we see here. 

Peng and Yao’s [52] Log-LAD criterion is Yl't =2 1^2/?” fo (^) I • The WLQML criterion 
is X]r= 2 {h^h(^^( 0 ) + \yt/h\^'^{9)\}wt where we choose the weights {rct} as in Zhu and 
Ling [61], equation (2.4): wt = (max{l, > C')|})“'‘ where C = 

y(o.\o«) '^nd yt-i = 0 Vi > t. 

The PQML estimator detailed in Berkes and Horvath [5], Example 2.3, is based on 
the criterion — Y ^=2 fo(hr^^^(^)/( 2 /t/hy^( 0 ))) where f{u) = K{l + \u\)~^ with tail index 

> 1. The value K > 0 ensures /(w) drt = 1 and of course is irrelevant for estimation, 
hence we simply set K = 1. Identification of 0° requires i?[letl/(l + Jet])] = 1/0, while in 
the Pareto case Pdej] > a) = (1+ 0 )“'” it is easily verified that ii'[]etj/(l + |ei|)] = 1 /(k+ 1) 
hence we set 0 = k + I = 3.5 in both Paretian and Gaussian cases.® We also set 0 = 3 as 
a control case to see if small sample bias increases when e* is Pareto, as it should. 

5.2. Simulation results 

Table I contains estimator bias, root mse [rmse], and the Kolmogorov-Smirnov statistic 
scaled by its 5% critical value. We only report results for 0° in order to conserve space, 
while the omitted results are qualitatively similar. In Table 2, we report t-test rejection 
frequencies for tests of the hypotheses 0° = 0.9, 0° = 0.70 and 0° = 0.50, where the first 
is true. If is the sequence of i? = 10,000 independent estimates of 0°, we use 

the empirical variance ^~ to standardize 9^^^ for KS test 

and t-test computation. 

®Simply note P(\et\ > a) = (1 + implies P(|et|/(1 + |et|) > a) = P(|et| > a/(l — a)) = (1 - a)'^ 
hence £;[|et|/(l + |et|)] = // P(|ei|/(1 + |et|) > a) da = //(I - a)« da = 1/(1 + k). 


Table 1. Simulation estimation results for ^3 



et ~ P 2.5 






et-V(0,l) 





n = 100 



n = 800 



n = 100 



n = 800 



Bias 

RMS“ 

KS*’ 

Bias 

RMS 

KS 

Bias 

RMS 

KS 

Bias 

RMS 

KS 

QMTTL-SA'^ 

- 0.010 

0.092 

1.75 

0.008 

0.045 

1.45 

-0.063 

0.095 

3.87 

0.001 

0.030 

1.07 

QMTTL-WA 

-0.031 

0.102 

3.01 

0.024 

0.038 

2.76 

-0.060 

0.089 

4.76 

0.003 

0.030 

1.34 

QMTTL-S 

-0.041 

0.114 

4.69 

0.016 

0.044 

4.21 

-0.069 

0.075 

6.30 

0.005 

0.032 

1.89 

MNWM-I'* 

-0.023 

0.111 

2.31 

- 0.010 

0.064 

1.56 

-0.029 

0.108 

2.86 

-0.008 

0.036 

1.17 

MNWM-T 

-0.019 

0.103 

2.87 

- 0.012 

0.069 

1.61 

0.021 

0.113 

3.16 

- 0.010 

0.039 

1.20 

MNWM-E 

-0.025 

0.117 

3.13 

-0.016 

0.058 

1.50 

-0.026 

0.097 

3.02 

-0.013 

0.037 

1.31 

WLQML'' 

-0.063 

0.124 

5.92 

-0.135 

0.107 

7.64 

-0.092 

0.082 

8.12 

-0.088 

0.084 

6.05 

WLQMLb|..|=i 

-0.082 

0.219 

8.48 

-0.072 

0.089 

5.64 

-0.075 

0.078 

9.36 

-0.065 

0.067 

3.97 

PQML 3 

-0.048 

0.085 

6.17 

-0.039 

0.059 

3.00 

-0.065 

0.067 

9.07 

0.005 

0.032 

1.30 

PQML 3.5 

-0.034 

0.083 

4.74 

-0.018 

0.056 

3.17 

-0.064 

0.062 

9.54 

0.009 

0.029 

2.75 

PQMTTLf:'| 

-0.054 

0.116 

6.23 

-0.017 

0.056 

2.23 

-0.061 

0.074 

6.38 

0.011 

0.028 

2.65 

PQMTTLl^^ 

-0.031 

0.074 

4.28 

-0.012 

0.046 

1.35 

-0.051 

0.074 

8.21 

0.008 

0.028 

2.12 

PQMTTLf.g 

-0.027 

0.077 

4.05 

-0.019 

0.057 

2.43 

-0.055 

0.069 

8.30 

0.011 

0.027 

2.76 

Log-LAD 

-0.217 

0.165 

9.88 

-0.253 

0.149 

9.12 

-0.082 

0.100 

7.01 

-0.019 

0.046 

3.61 

QML 

-0.073 

0.099 

6.23 

-0.054 

0.078 

4.65 

-0.112 

0.089 

8.71 

-0.013 

0.034 

1.64 


“The square root of the empirical mean squared error. 

^The Kolmogorov—Smirnov statistic divided by the 5% critical value: KS > 1 indicates rejection of normality at the 5% level. 

^Benchmark QMTTL-SA (strong asymmetric) uses fractiles = 35^2,ni QMTTL-WA (weak asymmetric) uses ki^n = 10/^2,ni QMTTL-S 
(symmetric) uses ki^n = k 2 ,n- 

^Benchmark MNWM-I uses the simple trimming function 'ip{u, c) = ul{\u\ < c); MNWM-T and MNWM-E use Tukey’s bisquare and exponential 
transforms. 

^WLQML is Weighted Laplace QML. WLQML£;|e^|^i is WLQML for processes with E\et\ = 1- PQML.^ is power-law QML with criterion 
index 'd. PQMTTL^^ and PQMTTL®"^ are tail-trimmed PQML with weak asymmetric (fci,T = or strong asymmetric (fci,T = 9^2, t) 

trimming. 


to 
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Table 2. Test rejection frequencies^ at 5% level for ^3 



tt ~ P 2 . 

5 





£t-Ar(0,l) 





n= 100 


n = 800 


n = 100 


n = 800 


Ho 

Hi 

h! 

Ho 

Hi 

Hi 

Ho 

Hi 

Hi 

Ho 

Hi 

Hi 

QMTTL-SA'’ 

0.054 

0.694 

0.995 

0.046 

0.951 

0.999 

0.059 

0.664 

0.789 

0.048 

1.00 

1.00 

QMTTL-WA 

0.068 

0.431 

0.924 

0.041 

1.00 

1.00 

0.065 

0.067 

0.868 

0.045 

1.00 

1.00 

QMTTL-S 

0.074 

0.256 

0.880 

0.036 

1.00 

1.00 

0.058 

0.166 

0.942 

0.040 

1.00 

1.00 

MNWM-F 

0.055 

0.521 

0.840 

0.054 

0.899 

0.997 

0.058 

0.716 

0.927 

0.047 

0.998 

1.00 

MNWM-T 

0.043 

0.791 

0.981 

0.055 

0.878 

0.991 

0.031 

0.963 

0.998 

0.054 

0.992 

1.00 

MNWM-E 

0.050 

0.236 

0.907 

0.058 

0.867 

0.982 

0.062 

0.573 

0.981 

0.053 

0.988 

1.00 

WLQML'' 

0.058 

0.045 

0.838 

0.038 

0.006 

0.260 

0.047 

0.031 

0.809 

0.043 

0.052 

0.793 

WLQMLb|..|=i 

0.041 

0.012 

0.493 

0.041 

0.036 

0.436 

0.038 

0.021 

0.771 

0.044 

0.104 

0.965 

PQML 3 

0.058 

0.367 

0.955 

0.060 

0.688 

1.00 

0.049 

0.243 

0.980 

0.034 

1.00 

1.00 

PQML 3.5 

0.051 

0.578 

0.980 

0.058 

0.891 

0.998 

0.045 

0.246 

0.983 

0.039 

1.00 

1.00 

PQMTTLi:’| 

0.078 

0.122 

0.845 

0.043 

0.863 

1.00 

0.058 

0.260 

0.952 

0.043 

1.00 

1.00 

PQMTTLj'^ 

0.053 

0.600 

0.978 

0.056 

0.938 

1.00 

0.060 

0.281 

0.955 

0.051 

1.00 

1.00 

PQMTTLf.s 

0.053 

0.662 

0.978 

0.055 

0.888 

1.00 

0.057 

0.385 

0.972 

0.040 

1.00 

1.00 

Log-LAD 

0.061 

0.009 

0.000 

0.025 

0.000 

0.000 

0.058 

0.046 

0.785 

0.053 

0.951 

1.00 

QML 

0.065 

0.109 

0.789 

0.061 

0.342 

0.733 

0.061 

0.000 

0.575 

0.051 

0.997 

1.00 


“The hypotheses are Hq: 6s = HI: 6s = 6^ — 0.2, and 6s = 9^ — 0.4, where 6^ = 0.9. 

^Benchmark QMTTL-SA (strong asymmetric) uses fractiles ki^n = 35A:2,n; QMTTL-WA (weak asymmetric) uses ki^n = 10/^2,n,; QMTTL-S 
(symmetric) uses ki^n = k 2 ,n- 

^Benchmark MNWM-I uses the simple trimming function c) = ul(\u\ < c); MNWM-T and MNWM-E use Tukey’s bisquare and exponential 
transforms. 

'^WLQML is Weighted Laplace QML. WLQML£;|e^|^i is WLQML for processes with E\et\ = 1. PQML^^ is power-law QML with criterion 
index d. PQMTTL^^ and PQMTTL®-^ are tail-trimmed PQML with weak asymmetric {k\^T = 5^2, t) or strong asymmetric (fci,T = 9^2, t) 
trimming. 
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Log-LAD and WLQML perform poorly when E[e^] = oo: in small samples they are 
sensitive to large error observations, contrary to their theoretical robustness properties 
asymptotically. Indeed, Log-LAD leads to exceptionally poor inference when E[e^] = oo 
due to a high degree of bias, and is worst overall. Further, WLQML is sensitive to 
large errors even in the Gaussian case. It is not surprising that Log-LAD and WLQML 
are similar since Laplace QML merely generalizes LAD to a likelihood framework (Zhu 
and Ling [61]). QML performs better than Log-LAD and worse than WLQML when 
E[e^] = oo, and is better than both when e* is normal. 

PQML is more promising than QML, Log-LAD and WLQML. It performs better on 
all measures and in nearly every case: Log-LAD and WLQML are closer to normally 
distributed for Gaussian ct with small n = 100. In particular, PQML has the smallest 
rmse of all estimators in this study, suggesting that it exhibits very low empirical variance 
since it has higher bias than QMTTL and MNWM. Identification is assured in the Pareto 
case K = 2.5 when i? = 3.5, so it is not surprising that bias in the Pareto case is higher when 
d = 3. Further, there should be noticeable bias in the Gaussian case since identification 
fails, yet bias is actually smaller than for Paretian errors when n = 800. It is important 
to stress that PQML with index d = 3.5 is perfectly suited for our Paretian case P(|et| > 
a) = (1 since this non-Gaussian QML leads to identification and therefore Fischer 

consistency. However, even this estimator exhibits more bias than QMTTL and MNWM 
evidently due to the adverse effects of sample error extremes (see Section 5.3). 

The best estimators in this study are QMTTL (with strong asymmetric trimming) 
and MNWM in terms of bias, approximate normality and test performance, while only 
PQML has a smaller rmse. QMTTL with strong asymmetric trimming = 35A:2,n), 
as required in the Paretian case when n = 100, is superb when ej is Paretian for either 
n € {100,800}, and works very well in the Gaussian case with a rmse close to PQML. 
Overall, QMTTL with strong asymmetric trimming is the best estimator since it beats 
MNWM in terms of bias and approximate normality in nearly every case and has a small 
rmse in all cases. 

QMTTL with weak asymmetric (fci_„ = 10^2,n) or symmetric {ki^n = ^ 2 ,^) trimming 
lead to greater bias when e* is Paretian, and to negligible bias when e* is Gaussian, 
in each case as this estimator should. Nevertheless, QMTTL with weak asymmetric 
or symmetric trimming is superior to QML, Log-LAD, and WLQML by all measure; 
QMTTL with weak asymmetric trimming beats PQML by all measures except rmse; and 
QMTTL with symmetric trimming beats PQML when n = 800. Our QMTTL simulations 
strongly point to the use of strong asymmetric trimming in general since it is valid for 
thin tailed errors, and necessary for heavy tailed errors. They also reveal that using weak 
asymmetric of symmetric trimming still leads to a competitive estimator. 

Further, re-centering after trimming in the MNWM estimator in general leads to 
higher mean-squared-error than QMTTL. Recall this estimator may be less efficient than 
QMTTL, and QMTTL with strong asymmetric trimming results in the lowest bias of all 
estimators in this study. Nevertheless, MNWM works well, with the second smallest bias, 
and overall is closer to normal than all estimators save QMTTL with strong asymmetric 
trimming. As discussed in Section 2.3.3, the preferred estimator depends on the analyst’s 
agenda: MNWM is always asymptotically unbiased with symmetric trimming which is 
easy to implement, while QMTTL performs better in small samples. 
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5.3. Addtional experiments for WLQML and PQML 

We now perform two additional experiments. First, recall WLQML requires E\et\ = 1 
which does not hold for either Paretian or Gaussian errors in this study. We now stan¬ 
dardize Ct such that E\et \ = 1 to see if ensuring identification helps in small samples. The 
results are nevertheless qualitatively similar whether E[e1\ = 1 and E\et\ 1, or E\et\ = 1, 
is true. See Tables 1 and 2. In fact, for heavy tailed errors WLQML actually performs 
worse in terms of bias and approximate normality when identification is assured. Further, 
inference is still quite poor in many cases. This suggests the previous poor performance 
of WLQML is not due to the identification condition failing to hold. 

Second, recall that QMTTL has lower bias and is closer to normally distributed 
that other estimators whether trimming is needed or not. We therefore tail trim 
the PQML criterion to see if the benefits of trimming carry over to non-Gaussian 
QML. Recall PQML with index z? > 1 has the identification condition E[ut] = 0 
where ut := |et|/(l -|- |et|) — l/-d. Dehne u[~\o) := ut{0)I{ut{0) < 0) and u'^\9) := 
ut{0)l{ut{9) > 0) and their order statistics < ••• < < 0 and > 

••• > '“(^)^(^) — 0- Let be intermediate order sequences and let 

be positive sequences satisfying P{ut{9) < and P{ut{9) > c^l) = 

k 2 ,nl'n. The tail-trimmed PQML (PQMTTL) criterion is — + 

< um < W). 

If et is Paretian P{\et \ >a) = (I-|-a)“^-® it is straightforward to show when 

n = 100 and when n = 800 renders roughly E[utl{—c^i)^ < Ut{9) < 02 “,^)] = 0. 

We therefore set symmetric = ^^1)^ weak asymmetric = 5k^^) or strong asym¬ 
metric {k^^ = 9k^l) trimming with k^^ = max{l, [0.025n/ln(n)]}. Tables 1 and 2 show 
PQMTTL with weak asymmetric trimming performs better than PQML in all cases. If 
we use strong asymmetric trimming then the over-trimming for n = 100 leads to greater 
bias, but when n = 800 the estimator works well as it should, in particular it is closer to 
normal and therefore has better inference than PQML. Conversely, symmetric trimming 
leads to greater bias when n = 800 as it should. QMTTL with strong asymmetric trim¬ 
ming and MNWM with simple or exponential trimming are better than PQMTTL in 
terms of bias and approximate normality in most cases. Consider when n = 800 then in 
the Pareto case PQMTTL with weak asymmetric trimming is marginally closer to normal 
and slightly more biased than QMTTL, and in the Gaussian case PQMTTL is slightly 
less biased and farther from normally distributed than QMTTL. Overall tail-trimming 
seems to matter even for an inherently heavy tail robust non-Gaussian QML estimator. 


6. Empirical application 

Finally, we apply our estimators to asset returns series generated from the London Stock 
Exchange (FTSE-100), the NASDAQ composite index (IXIC), and the Hang Seng Index. 
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The period is Jan. 1, 2008-Dec. 31, 2010, representing 757, 757 and 756 daily observations 
respectively, net of market closures. We use log-returns yt = \n{xt/Xt-i) where Xt is the 
daily open/close average of each index.® 

As in Section 5, we compute MNWM using simple trimming denoted “1”, Tukey’s 
bisquare and exponential transforms, with fractiles /c„ = max{l, [0.025n/ln(n)]} and 
k„ = max{l, [O.lln(n)]} for trimming by Ct and yt-i, respectively. Similarly, QMTTL 
is computed using strong asymmetric = 35^2,n), weak asymmetric = 10 ^ 2 ,n), 
and symmetric = k 2 ^n) error fractiles denoted “SA”, “WA” and “S”, with fc 2 ,n = 
max{l, [0.025n/ln(n)]}, and A:„ for yt-i- The parameter space is 0 = [i, 2] x [t, 1 — i] x 
[t, 1 — t] where t = 10“^®. 

See Table 3 for estimation details where standard errors are computed using (25) for 
QMTTL and its logical extension for MNWM. In each case a GARCH model fits well, 
while QMTTL and MNWM produce qualitatively similar estimates. The various MNWM 
estimates are similar across transform type, especially exponential and simple trimming 
versions. The QMTTL estimates are somewhat similar across asymmetric and symmetric 
trimming. For example, evidence for IGARCH or explosive GARCH -I- /3„ > 1 exists 
only for the NASDAQ based on QMTTL-SA and MNWM-I, while QMTTL-WA and 
QMTTL-S lead to smaller values. However, in all cases /3„ is near 0.9 and is near 0.05, 
in many cases dn -l-/3n ~ 1, and for each series the various estimates are quite similar. The 
latter suggests the various asymmetric and symmetric trimming strategies for QMTTL 
work as well as inherently asymptotically unbiased MNWM. This is matched by our 
simulations where n = 800 aligns with the sample sizes in the present empirical study: 
strong asymmetric trimming leads to the best QMTTL results when ct has power law 
tails with a small index k, but each trimming strategy leads to similar results, especially 
when n = 800. 


7. Conclusion 

We develop tail-trimmed QML and Method of Moments estimators for GARCH models 
with possibly heavy tailed errors Ct that satisfy E\ef\ = 1. In the Method of Moments 
case, the model errors are first negligibly transformed with a redescending function, and 
then re-centered to control for small sample bias induced by the transform. We show by 
Monte Carlo experiment that tail-trimming within a QML framework dominates QML, 
Log-LAD and Weighted Laplace QML based on bias, mean-squared-error, approximate 
normality, and inference, and trumps Power-Law QML in all aspects except variance 
(Power-Law QML has higher bias yet lower mean-squared-error). Only QMTTL and 
MNWM directly counter the negative influence of large errors in small and large samples. 
Indeed, we show trimming leads to a better infeasible Power-Law QML estimator in 
small samples. The next stage must involve a theoretical development of data-dependent 
or automatic fractile selection, including possibly bootstrap and covariance determinant 
methods. This is left for future research. 


®The data were obtained from http://finance.yahoo.com, and the open/close average is computed 
using the reported adjusted close values. 
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Table 3. QMTTL and MNWM estimation results for financial returns 



LO 

a 

P 

UJ 

a 

P 


QMTTL-SA“: fci,„ 

, = 35^2,71 


MNWM-r 



NASDAQ‘S 

HSI'* 

LSE 

0.029 (0.031)'^ 
0.058 (0.064) 

0.066 (0.083) 

0.113 (0.082) 
0.106 (0.151) 
0.213 (0.156) 

0.893 (0.069) 
0.878 (0.252) 
0.743 (0.224) 

0.016 (0.008) 
0.020 (0.009) 
0.025 (0.006) 

0.117 (0.017) 
0.078 (0.013) 
0.119 (0.015) 

0.884 (0.018) 
0.915 (0.015) 
0.822 (0.020) 

NASDAQ 

HSI 

LSE 

QMTTL-WA: fci.„ 
0.017 (0.038) 

0.046 (0.086) 

0.022 (0.065) 

= lOfo.n 

0.138 (0.113) 
0.082 (0.142) 
0.179 (0.134) 

0.849 (0.135) 
0.910 (0.211) 
0.805 (0.194) 

MNWM-E 

0.032 (0.010) 
0.021 (0.011) 
0.030 (0.011) 

0.102 (0.021) 
0.078 (0.016) 
0.125 (0.019) 

0.886 (0.019) 
0.915 (0.028) 
0.824 (0.031) 

NASDAQ 

HSI 

LSE 

QMTTL-S: fci,„ = 
0.012 (0.033) 

0.065 (0.092) 

0.034 (0.076) 

^2,n 

0.171 (0.145) 
0.092 (0.123) 
0.214 (0.189) 

0.839 (0.125) 
0.887 (0.163) 
0.752 (0.203) 

MNWM-T 

0.033 (0.011) 
0.039 (0.011) 
0.058 (0.016) 

0.095 (0.031) 
0.076 (0.017) 
0.122 (0.021) 

0.901 (0.034) 
0.920 (0.027) 
0.839 (0.024) 


“SA = strong asymmetry; WA = weak asymmetry; S = symmetric. 

= simple trimming, T = Tukey’s bisquare, E = exponential. 
^Standard errors are in parentheses (•). 

‘^HSI = Hang Seng; LSE = London Stock Exchange. 
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Appendix: Proofs of main theorems 

Recall £t{d) '■= ~ score and Jacobian equations are mt{0) = £t{0)5t{0) and 

GtiO) =£t{e)dt{9) - ef{9)st{0)st{0y where 



Define indicators, trimmed score equations and corresponding covariance and Jacobian 
matrices: 


■= rnt{9)lnji9) and mn,t{d) ■= rnt{9)lnj(9), 
T^n{S) ■= E[mn,t{0)mn,t{Sy] and 
g{9) :=-E[5t{9)st{ey] and Vn{e) = ng{9yi:y\9)g{9), 





and 


By independence and identification Assumption 2 = x (l+o(l)). 

We implicitly assume all functions in this paper satisfy Pollard’s ([54], Appendix C) 
permissibility criteria, the measure space that governs all random variables in this paper 
is complete, and therefore all majorants are measurable. Cf. Dudley [19]. Probability 
statements are therefore with respect to outer probability, and expectations over majo¬ 
rants are outer expectations. 

A.l. Theorems 2.1 and 2.2 

The proofs of QMTTL consistency and asymptotic normality Theorems 2.1 and 2.2 
require supporting lemmas. We state them when required and provide proofs in Ap¬ 
pendix A.3. Consistency requires bounding variance bounds, 

and laws of large numbers. Unless otherwise noted, Assumptions 1 and 2 hold. 

Lemma A.l (Asymptotic approximation), (a) ~ = 

Op(l); (b) supggel|l/»^Er=i{^ri.t(^) -™n.t(6')}|| =Op(supege^ll’7^n.t(^)ll)• 

Lemma A.2 (Variance bounds). Under Assumption 1(a) E„ =o(n/ln(n)); (b) iS„ = 
o(n/ ln(n)). 

Remark 11. Under Assumption 2 = E„(l -|- o(l)) hence then (b) follows from (a). 


Lemma A.3 (LLN and ULLN). (a) = Op(l); (b) supgg 0 {j|l/n x 

Yyy=i'mn,t{0) - E[mn,t{0)]\\} = Opisupg^Q E\\m^49)\\). 
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Asymptotic normality requires an expansion, central limit theorem, and Jacobian con¬ 
sistency. 

Lemma A.4 (Asymptotic expansion). Let {0„} and {0n\ he any sequences of ran¬ 
dom variables in 0 with probability limit 0^. Let G 0 satisfy \\0n,* — dn\\ < \\dn — dn\\ 
which may he different in difference places, (a) = 

^n{dn,*) X (6>„ - On) X (1 Op(l))4; (b) “ ™n.i(0n)} = Qn{0n,*) X 

{On -On) X (l-bOp(l)). 

Lemma A.5 (CLT). ^ 

Lemma A. 6 (Jacobian), (a) Gn{0D = t/ x (1 -|- Op(l)) and §^nOD = t/ x (1 -|- Op(l)) 
for any On A’ 0°; (b) l/nY)t=i^t{A)5't{0n) = G x {1 + Op{l)); (c) {d/dO)E[mn,t{0)]\eo = 
G X (l-ho(l)); (d) limsup„^^supege-E^IN„.t(6')|| < K\\G\\ x (l-bo(l)). 


We are now ready to prove Theorems 2.1 and 2.2. 


Proof of Theorem 2.1. Dehne rhn{0) \=lfn YA=i '^ri,t{0)^ mn{0) := ^/nYAt=i '>TT'n,t{0), 
Mn{d) := E[mnA0)\ and e„ := supgg0£l||m„_t(0)||. We use an argument in Pakes and 
Pollard [50], pages 1038-1039. 

Step 1. We first prove a required inequality: 


e(J) liminf inf {||Al„(^)||/en} > 0 for any small J > 0. (A.l) 

ra^oo 6»ee:||e-6lO||>(5 


Note E[mn,t] —t E[£tSt\ = 0 by dominated convergence and independence. By the defi¬ 
nition of a derivative and Lemma A. 6(c) we have E[mn,t{0)] = G x {0 — 0^) x (1 -|- o(l)) 
where G = — i?[sts[], and bound Lemma A. 6(d) states e„ := supgge E\\mn,t{0)\ < A||C/|| x 
(1 -I-o(l)). It therefore follows for every n> N and J > 0 


inf {e„^||A'[m„,t(6>)]||} > A inf 
||e-eo||>5 " " ^ ||6(_eo||>,5 




X (l-fo(l)) >0. 


Step 2. In view of (A.I) we have P{\\0n — > d) < P{\\Mn{A)\\An > e(<5)), hence 

it suffices to show ||Al„(0„)||/e„ = Op(I) in order to prove On A 0°. By Minkowski’s 
inequality 

||Aln(0n)||/e„ < ||m„(0„)||/e„ -b ||m„(0„) - 7W„(0„)||/c„ = Ai,n(0„) -b A 2 ,„( 0 „), 


say. The proof is complete if we show AiAA) and A 2 ,n{A) are Op(I). 

Consider AiAA)- We exploit theory developed in Cizek [16], Lemma 2.1, page 
29. By distribution continuity and linearity of the volatility process {cr^}, Qn{0) '■= 
l/n^[]^i(ln(Tj (0) -b j/j/cr^(0))/^^(^(0) is almost surely twice differentiable at In 
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particular, up to a scalar constant {d/d9)Qn{0)\g = rhn{0n) a.s. By a minimum 
Qn{9n) < Qn{9)yd S 0 it follows ||m„( 0 „)|| = 0 a.s., while liminf„_>oo > 0 by distribu¬ 
tion non-degeneracy and trimming negligibility, hence = 0 a.s. 

Next A2,n{9n)- By Lemma A. 1(b) sup^ge ll™n(^) “ 'rnn{9)\\/cn = Op(l), and 
supggQ ||to„(0) — M„{0)\\/en = Op(l) by ULLN Lemma A. 3(b). Hence 


sup{.A 2 ,n( 6 ')} < sup 
9e0 see 


||m„( 6 l) - mn{0)\\ 


, _ \\mn{e) - Mn{0)\\ 
“T sup 
9Ge 


Op(l). 


□ 


Proof of Theorem 2.2. Use l/u-X^tLi ™n.t(0n) = 0 a.s. by the proof of Theorem 2.1, 
and expansion Lemma A. 4(b) to deduce for some 0*, ||0* — 0°|| < \\0n — 


^ A ^ 1 , \ 

QniKWn - 0°)(1 + Op(l)) + - X! ® 


(A.2) 


Consistency ||0* — 0°|| < ||0„ — 0°|| A 0 by Theorem 2.1 ensures Gn{9^) = 1/(1 +Op(l)) 
by Lemma A. 6 (a). Multiply both sides of (A.2) by rearrange terms and use 

Vn = nQ'Y.f^G to deduce Vn‘^{9n - 9°) = x (1 + Op(l)). In view 

of n~^^^Tjn ~ '>TT'n,t} = Op(l) by Lemma A. 1(a), we have 

1 " 

Vy2(0„ _ 00) ^ ^ ^ 0^(1))^ 


hence Vn'^{9n — 9°) N{0,lq) by Lemma A.S. Finally —>■ oo follows from the fact 

that ||1/|| > 0, and ||nE“^|| —oo by Lemma A. 2(a). □ 


A.2. Remaining theorems 

Define hf{9) := {d/d9)ht{9) and (9) := {d/d9)h^{9). We require stationary solutions 
(0), h*((0),/i*j’^(0)} of the volatility process {ht{9),hf ^{9),h^f (9)} in order to prove 
the asymptotic equivalence of the infeasible and feasible QMTTL estimators. 

Let {S( (0),hj (0)} denote {st(0),Ot(0)} evaluated with {hi{9), h*^^{9), (9)}. Define 

error and volatility derivatives evaluated at {ht{9), h^(9), h^’^(9)} 

~e,{9)-.= ^=, ?’t(0):=e1(0)-l, 

mt{9) ■.= ?t{9)st{9), Gt{9) := ^mt{9) and §’:=-U[st(0)s((0)]. 
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DeHnef'lie) := /(^J,„)(0) < ^£„)(0)) and let {F„(0),Zi„(d)} satisfy P(g’t(0) < 

-^niO)) = fci,„/n and P(?t(0) > W„(d)) = fc 2 ,„/n. Similarly := J(-F„(0) < 

rp ~ ^ '^(^) 

w(^) Define trimmed variants rhn,tiO) := and rhn,ti9) :=fht{0) x 

liiid)- 


Lemma A.7 (Stationary solution). Let at{e) e and a*{e) e 

(a) A stationary and ergodic solution a^{9) exists for each 0 £ 0, it is cr(yT-:r < 

t — 1)-measurable, and infgge (0) > 0 a.s. Further, h'l{9^) = a.s., h^^{9) = 

{d/de)h*{e) andhf'\e) = {d/de)hf{e) a.s. 

(b) P[supgg 0 |aj( 0 )|''] < cxD for some tiny t > 0 . 

(c) If at{0) is any other stationary solution then P[(supgg 0 la^ (0) — at(0)|)‘] = o(p‘) 
for some p £ (0,1). 

(d) P[supgg 0 \Wf{0) - Wti0)\] =o(p*) for each wt{0) £ {Si,t( 0 ), c)ij,t( 0 )}. 

(e) l/nX;r=i^[suPe6e \in}{0) - ln}{0)\] and E[supg^Q l^.t W “ In,k^)\] 

are o(l). 


Proof of Theorem 2.3. We first characterize properties of random variables based on 
ht (0). We then prove consistency of the feasible QMTTL estimator 0„ A 0°. Lastly, we 
prove the claim Vy^(0„ — 0„) A 0. 

Define m„(0) := l/nX^Ai '^n{0) ■= -^^(0) := 1/n x 

Y^t^iE[rhn,t{0)], and e„ := l/nX]r=iSupeg0P||m„,t(0)||, and recall e„ := 

supeg0P||m„,t(0)||. 

Step 1: Use Lemma A.7 to obtain |e„ — e„| < supgg0 l/nX]"=i ll™ra.i(^) ~ nrn,t(0)]|| = 
0,(l/n) = 0,(1). Similarly, ||l/nELi ||G*(0)il^)(0)-G,(0)/^)(0)||, |ll/nELi l|G*(0) x 

A, t(^) ~ Gt{0)i^}{0)\\\, and ||§*(0) — G{0)\\ are uniformly 0,(1), and for any se¬ 
quence of positive numbers {g„}, -)■ oo, supgge{l/g„I |et(^')| - |et(^')l 1 = 

0,(1). Use the latter to deduce supgg0 A:A^|^j;,^^(0) — £’^^“^^(0)| A 0, hence by Lemma 

B. 2 sup,g0|g5“|„)(0)/£„(0) + 1| = 0,(l/fcJ/„2) and sup^^g |g(2„)(^)/^"(^) " = 

0,(1/^^ n). By similar arguments and Lemma A.7 it is straightforward to verify Lemmas 
A.l, A.S and A.4 extend to m„(0) and m„_t(0). 

Step 2 (0n A 0°).- We follow the proof of Theorem 2.1. By the Lemma A. 6(c), (d) 
arguments and ||§*— t/|| = o(l) it follows 1/n^Ai A'[™n.t(0)] = G x (0 — 0°) x (1 + o(l)) 
and e„ < Ar||t/||. Since ||t/|| > 0 it follows e(S) := liminf n—>■00 inf||e_eo||>5{c„^|ll/n X 
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S"=i> 0 for every n> N and (5 > 0. Therefore P{\\0n — ^°|| > S) < 
P{\\Mn{0n)\\/tn>m)- ^ remains to show||^„(0„)||/e„ = Op(l). 

Note ||^„(0„)|| < ||to„( 0„)|| + ||m„(0„) - Mn{0n)\\, where to„( 0„) = 0 a.s. by a 
minimizer. It remains to show ||m„(0„) — M^n{0n)\\/^n = Op(l)- Note 


sup||m„(0) - Mn{0)\\ 

9e0 

< sup||to„( 6») - m„(6»)|| + sup||m„(6») - 7W„(6')|| + sup||^„(6') - 7W„(6»)||. 
eee eee eee 

The first and third terms on the right-hand side are Op(l) by Step 1. The second is Op(e„) 
by the proof of Theorem 2.1. Since |e„ — e„| = Op(l) we have shown sup^gg ||m„(0) — 
^n(6')|| =Op(e„) hence ||m„(0„) - 7^„(0„)||/e„ =Op(l) as required. 

Step 3 (Vn^ifin — 0n) ^ Oj; The first order conditions are ^t=i''^n,t{0n) = 0 
^ - - - \i£) 

a.s. and TO„,t(6'n) = 0 a.s. Combine 0n^0°, sup^^e Hl/nX^Li “ 

'^(S) 

Gt{0)inj{0)]\\ =Op(l), and ||§’-t/|| =o(l) to deduce by Lemma A.6 Gt{0n)ln,t = 

Q X (l-l-Op(l)). Therefore, in view of consistency of the infeasible estimator A 0°, 
expansion Lemma A.4, and the construction Vn = it follows 

-i n - n 

” t=i t=i 

(A.3) 

= vy2(0„_0„)(i + op(l)). 

Further, by two applications of Lemmas A. 1(a), A.4 and A.6, and cancelling the terms 
Vn'^ijtn - 0°) = rA^‘^T^n^^^Q{0n “ 6*°), we have 


1 ^ - 

1 ” ^ 

= 1/2^^ri ^ ^ X T^ri,t{0n) ~ TTlnu} 

^ t=l 

n 1 ^ 

^ 'y ] {TAn,ti0n) ~ H TJ^n ^ ^ ~ 

t=l ^ t=l 

1 ” 

~ j^l/2 ^ ~ Wn,*} 

+ - 0°)(1 + Op(l)) - n^/^^-^/^g{0^ - 0°)(1 + Op(l)) 
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1 " 


Combine (A.3), (A.4) and Theorem2.2to obtain Vy^(0„ — 0„) = — 

”T-n,t}(l + Op(l)). By Loeve’s inequality, liminf„_,.oo ||S„|| > 0 in view of non-degeneracy 
and trimming negligibility, and Lemma A. 7(d), it follows for tiny i > 0, p £ (0,1), and 
sufficiently large n and K 


E 


ll/2 




t=l 



Therefore, Vn'^{0n — On) = Op(l) by Markov’s inequality. 


□ 


Proof of Theorem 3.1. By Assumption 4 ijj{u,c) = uvu{u,c)I{\u\ < c) behaves like 
ul{\u\ < c) as c —>■ oo. See (21). In the following, we therefore only treat the simple 
trimming transform ip(u,c) = ul{\u\ < c). The general case with properties (21) and (23) 
has a similar proof. 

Lemmas A.1-A.6 extend to cover the equations 

rh„,t(0)= |^e?(d)/(:l(0)-^Xe?(0)/(:l(d)^ x 5^(0), 

rhnAO) = {eU0)li%0) - E[eA0)lLim) x {^t{0) - E[st{9)]). 

Consider Lemma A. 1(a). By Lemma A.l, it follows 




.2f(0 


X St 






“ X (“ “ X St -f 0p(i), 


where by independence and dominated convergence S]„ ~ E[{e^Inl ~ ^[£^4^(1])^] ^ 
E[sts'i\ ='■ o'^6. Now add and subtract £'[e2/^'^j] and A'[st] to deduce 


v-i/2 V { - i V Af'' 


2fC) 


X St 


t=l 


t=l 


1 " 

= S-1/2 _ E[Atli% X {St - E[st]) + Op(l) 


(A.5) 


7,1/2 


n / 1 ^ \ 

-X^‘“^N x(l+Op(l)). 
\ / 
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Under Assumption 1 St is stationary, ergodic and integrable, hence l/nX]"=i ~ = 

Op{l), and by a generalization of central limit theorem Lemma A.5 ~ 

E[e^I^^f]) = Op(l). The second term in (A. 5) is therefore Op(l), hence x 

—'>Tin,t) = Op(l) which extends Lemma A. 1(a) to In view of 

L 2 +t-boundedness of supggj^^ l|st(0)|| for some compact No C 0 with positive Lebesgue 
measure and containing 0°, and independence of e*, the arguments used to prove Lemmas 
A. 1(b), A.2-A.6 carry over with simple modifications to cover The claims 

therefore follow by imitating the proofs of Theorems 2.1 and 2.2, and by the constructions 
of Vn and Vn- □ 

Lemma A.8. 4 1. 

Proof of Theorem 4.1. The claim follows from Jacobian consistency Lemma A. 6 (b) 
and Lemma A. 8 . □ 

A.3. Proofs of supporting lemmas 

In order to decrease the number of cases we augment Assumption 1 (b) and impose power 
law tails on ej in general: 

P(|et| > a) = (ia“”(I + o(I)) where d G (0, oo) and k € (2, oo). (A. 6 ) 

Notice £*(0) is stationary and ergodic on 0 by (2), and also has a power law tail. The 
latter follows by noting 

4(6') = e44-l, 

where i?(supgg 0 \af /erf {6 )\)p < oo for any p > 0 under Assumption 1. Since e* is inde¬ 
pendent of atlcrt{0) the product convolution ct x {at/at{0)) has tail (A. 6 ) with the same 
index k > 2 (Breiman [10]). In general Ihua^oo supgg 0 {|c”P(|et(d)| > a) — d(d)|} = 0 and 
infeg 0 {d( 0 )} > 0 and supgg 0 {(i( 0 )} < oo. Hence, in view of (14), £t{0) '■= e?(^) — 1 also 
satisfies 


lim sup{|a''ApQ£:j(5()| > a) — d(0)|} = 0 

a->oogg0 


where inf {d(0)} > 0 and sup{(i(d)} < oo. 
See ego 


(A.7) 


Recall P[\£t[0)\ > C„(0)) = kn/n holds for C„(0) =Un{0) and = fc 2 ,n- Then by (A.7) 

C„(4 = d(42/4n/fc„)2/«. (A.8) 
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Further, by (A.7) and an application of Karamata’s theorem; 

if K = 4: E[£f{0)I{\Stm < Cni0))] - d{d) ln(n), 

ifK;<4: E[£^{e)I{\£m<Cnm]-^j^Cl{e)P{\£t{e)\>Cr.m (A.9) 

4 — /v 

4 — Av 

Uniform bounds are similar given (A.7)-(A.9). For example, when k < 4: 

f;[£: 2(0)/(|£(0)| <c„(d))]} ^ 

Unless otherwise noted, and in view of (14) and (A.7), we assume two-tailed trimming 
to reduce notation, hence thresholds and fractiles are simply Cn(0) and fc„, and order 
statistics are where £[°'\d) ■=\£i[0)\. 

The proofs of Lemmas A.1-A.8 require two supporting results. See the supplementary 
material Hill [32] for proofs. First, trimming indicators satisfy a uniform CLT. 

Lemma B.l (Uniform indicator CLT). Define '■= {{'n/knY^‘^){I{\£t{0)\ < 

C„(0)) - E[I{\£,{e)\<Cnm]}- Then {n-^'^YJi=iTnA0)-0 e 0} {1(0) ;0 G 0}, 

where X(0) is a Gaussian process with uniformly bounded and uniformly continuous sam¬ 
ple paths with respect to L 2 -norm, and =>* denotes weak convergence on a Polish space 
(Hoffman-J0rgensen [35]). 

Second, intermediate order statistics are uniformly bounded in probability. 

Lemma B.2 (Uniform order statistic bound). sup,gel4“U^)/^"(^) - 1| = 
Opil/kV^). 

Lemmas A.l, A.3, A.4 and A.6 are similar to results proven in Hill [31], Appendix A, 
hence their proofs are relegated to the supplementary material Hill [32]. 

Proof of Lemma A.2. 

Claim (a): St is L 2 -i-t-bounded by Assumption 1, hence by error independence 
E[£fl^J] X E[5[^] ^ KE[£fl^J]. The claim now follows from arguments leading to The¬ 
orem 2.5. 

Claim (h): We prove the claim for Si^i^n, so let denote rui^t, hence St denotes and 
express as 5„. Note ~ E[ml_^[\ -f 2X)”r/(l - i/n)E[mn,imn,i+i]- If E[m[] < oo 
then Sn'^ Ki = o(n/ ln(n)) in view of geometric /3-mixing (cf. Ibragimov [36]). 

Now assume E[m[] = oo. We first characterize the tails of m* = — l)st, and then 

bound 1]”=/ |U[m„,im„,i+i]l. 
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Step 1: By Assumption 1(b) and (A.6) independent et has a power law tail with index 
K G (2,4], and since + /3° > 0 it follows E[s^] < oo. Therefore has a power law tail 
with index K^n '■=k/2g (1,2], cf. Breiman [10]. 

Step 2: Dehne quantile functions Qn{u) = inf{m > 0 : P{\mn,t\ > m) < u} and Q{u) = 
inf{m > 0: P(jmtj > m) < u} for u G [0,1], recall geometric /3-mixing implies a-mixing 
with coefficients ah< Kp^ for pG (0,1). By Theorem 1.1 of Rio [56] 

n-l n-l „2ai pKp' 

^lA[m„,im„,,+i]l < 2^ / Ql{u)du<2'^ Ql{u)du. 

i=l i=i "'o i=i "'o 

Tail-trimming mn,t = TUtln t coupled with distribution continuity imply P{mn,t = 0) = 
knjn. Thus Qn{,u) = 0 for m G [0,fc„/n] and Qn{u) = Q{u) for u G (fc„/n, 1]. Further, 
under the Step 1 power law properties Q(u) = Therefore 

n —1 n—1 pKp^ n—1 

^lA'[m„,im„,i+i]l <K'^ u“^/'"dM< A"^max{0, 

i=i i=i 

Kln{n/kn) 

Moreover ~ = Arin(n/fc„) x (n/fc„)"‘/''“^(l + 

0(1)) and kn =o(n) hence 

n — 1 

^lA[m„,im„,i+i]l < K\n{n/kn) x (n/fc„)'‘/'"“^(l -f 0(1)) 

< K\n{n/kn) x (njkni ^^< Arin(n)(n/fc„)^'^”“^. 

Further, ln(n)(n/fc„)^/”“^ =o(n/ln(n)) since kn^oo and kG (2,4]. Finally, by Step 1 
and (A.9) A[m^ ^\ ~ K{n/kn)‘^^'^~^ if k < 4 and A[to^ j] ~ A'ln(n) if k = 4. Therefore 
Sn < K\n{n){n/kn)^^’^~^ = o( n/ln(n)) which completes the proof. □ 

Proof of Lemma A. 5. By identification Assumption 2 m„_t = 

- E[mn,t]} + o(l). Define Zn,t ■= {mn,t - E[mn,t]} for 

any r G R'^, r'r = l. Note by error independence, dominated convergence and (6): 

-^(Sr=i We will prove ^n,t A^(0,1), hence the claim will follow 

from the Cramer-Wold Theorem. Define := (T(yr - t <t). 

In view of geometric /3-mixing and stationarity under Assumption 1, and E[zf j] = 1 it 
suffices to show the three conditions of Theorem 2.1 in Peligrad [51] hold.^ The first two 

^We require a result like Theorem 2.1 in Peligrad [51] since asymptotically Zn,t need not have finite 
moments higher than two. 
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are 1/^StLi t] < ^he Lindeberg condition t^{\zn,t\ > 

en^/^)] —>■ 0 Ve > 0. By construction E[z‘^^^\ = 1 hence E[^n,t\ = 1 + o(l) which 

verihes the first. 

The Lindeberg condition holds if k > 4 since i5|et|^+'' < oo and E\Si^t\‘^~^'' < oo for some 
(. > 0, hence limsup^^^^ E\zn,t\'^'^'' < oo. Now suppose k < 4, assume E[£tl^^] = 0 to sim- 
plify notation, and note Zn^t = £tln ' St- By independence and L 2 -boundedness 

of St it follows E„ = E[£^I^J] x 6 where 6 = E[sts't\ is finite and positive defi¬ 
nite. By construction liminf„^ooinf^v^i ||Sn||(?'^S„>0 a.s., by independence 
£^ X x ||E„|| has Paretian tails with index k/ 4< 1, and by trimming 

\£^I^] \ < KC^. Therefore, for finite A" > 0 that may be different in different places. 


E[zlJizlt > e^n)] < Ke( {r'E-^/^tf x E £fli^}l( > 


’'t -^n,t \ '-'t -^n,t 


9 


{r'E-^/^Stf 

< KE{{r’Y.-^l\tf X E{£‘ll'ill[£ll'il > Ke^nE{£ll^^l\)\^t-,\) 

/ r 

<KE{{r'j:-^/htf xE / 

\ Ub 


UKe^nE[efli^J] 




In general = Kinjk^'^l^. If k = 4 then E\£’lI^J\ ~ K\n{n)) hence = K{n/kn) < 
Ke^nE[£^I^l] as n —>■ oo. This implies for some € N and all n > N that 
= ^' If ^ < 4 then E[£?li^J] ~ KCl{kJn) = K(nlk^YI>^-\ 

hence again C\ = K{n/kn) < Kn{nlkn)'^l'^~^ = K£^nE[£^ll^J] as n ^ oo. Therefore, 
E[z^^I{z^^^ > £^n)] = 0 for some G N and all n> N. This proves I/nX]”=i ^ 

I{\zn,t\ > £n^/2)] -)■ 0 Ve > 0. 

The third condition concerns the maximum correlation coefficient p{A, B) := 
sup^gj;, 2 (.A), 3 eL 2 (s) I 5)1 defined on L 2 {'S) the space of L 2 -bounded ^-measurable 

random variables. We require the interlaced coefficient := sup „>2 supg^, p{cr{zn,i ■ i G 
Tk),cr{zn,j:j G Afc)) to satisfy limfc_j.oo < 1, where Tk, Sk C {1,... ,n} are non-empty 
subsets with infsgSj,_tgTfc{|s — t|} > k, and sup 5 ^ is taken over all sets {Sk,Tk} for 
a given distance k. See equations (1.2), (1.7) and (1.8) in Peligrad [51]. In view of the 
GARCH process and Assumption 1, {zn^t A <t < n}n>i is a first order Markov chain 
that is stationary over 1 < t < n, and by geometric /3-mixing it is also geometric a-mixing. 
Since p\<\ as a, consequence of independence of e*, it therefore follows pj —0 by an 
extension of Theorem 3.3 in Bradley [9] to triangular arrays. □ 


Proof of Lemma A. 7. Claims (a)-(c) follow from the Assumption 3 response Lipschitz 
properties. See Francq and Zakoi'an [24, 25] and Meitz and Saikkonen [44]. Claim (d) 
follows from stationarity, independence of et, and (b) and (c). 

Consider (e). We will prove A[supgge \ i'^J{9) — (P)j] = o(l), the second 

claim being similar. We can approximate I{u) := I(u < 0) with the regular sequence 
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{3n('u)}n>i, defined by := Jff'^I{w)S{J\fn{w - u))J\fne where S(^) = 

g-i/(i-^^) j jf 1^1 ^ I = 0 if 1^1 > 1. Here {A/’„} is a sequence of 

finite positive numbers, Mn —>■ oo, the rate to be chosen below. See Lighthill [40]. 3„(u) 
is uniformly bounded in u, continuous and differentiable. Also, {d/du)I{u) has a regular 
sequence D„(u) := (A/’n/n:)^/^exp{—A/’n'a^}- 

Define et(a) := jftj — a and ttia) := |?t| — a, and let Sii(O) satisfy P(|Ft(0)| > 5^(0)) = 
k„/n. Hence (9) = I{ct{S^n{9)) and = I{zt{Cn{9)). Note Mn —>■ oo can be made as 
fast as we choose such that supgg 0 |/(et(C„( 0 )) — I{ct{&n{9))\ < K supgg 0 p„(et(C„( 0 )) — 
'3n{<^t{S^n{9))\ + Op(l), and —>■ 0 as fast as we choose. Hence, by the mean- 

value-theorem and boundedness of 2 )„(u) it follows supgg 0 |/(et(C„( 0 )) — I{zt{&n{9))\ < 
-f^supgge|?t( 6 ») - £ 4 ( 0 ) 1 -H A:supgge|P„( 6 ») - C„( 6 »)|. By (hi) sup 0 g 0 |Ft(0) - £*(0)] = 
Op(p*). Similarly supeg 0 Er=i ll^i(^)l “ l'£^t(^')ll = Op(l) hence supggQ |?|fc^)(0) - 
£(^^^j(0)| 0, hence by Lemma B.2 supgg 0 |Sii(0) — C„(0)| —>■ 0. Therefore by dominated 

convergence l/nX^Li E[supg^Q\In,ko) - 4 ^t^( 0 )|] < Kn-^ Y.t=i P* + o(l) = o(l)- ^ 


Proof of Lemma A.8. Define 3n,t(^) '■= i^)]- By the same 

arguments used to prove approximation Lemma A.l: i/nJ27=i^k^n)^!fk^n) = ^/n x 
Ykt=i (^n)(l + Op(l)). Since 3n,t{9) is uniformly integrable and geometrically /3- 

mixing by Assumption 1(d), it follows ^/nYkt=i 3ra.i(^) 1 by Theorem 2 and Example 4 

in Andrews [2]. Moreover, since 3n,t(9) is trivially Li-bounded uniformly in 0, 3n,t(^} be¬ 
longs to a separable Banach space, hence Li-bracketing numbers satisfy A[ j (e, 0, || • ||i) < 
00 (Dudley [20], Proposition 7.1.7). Combine the pointwise law and A[ ](£r,0, [j • l]i) < 00 
to deduce supgg 0 3Ti,i(0)| A 0 by Theorem 7.1.5 of Dudley [20]. Therefore 

l/n^]]^i£ 4 ^( 0„)/,^^4 ( 0 „)/£'[£ 4 ^( 0„)/^^4 (0„)] A 1. Further, by the definition of a deriva¬ 
tive: 1£;[£2(0)/^)(0)] - E[£flkh\< ||(9/a0)£;[£2(0)/^)(0)]l,ol| X 110-0011 x (1 +o(l)). 
By the same argument as Lemma A. 6 (c) we can write 


AE[£^(e)iiO(e)]\t, = E 




00 


A£) 

^n,t 


X (l + o(l)) 


= -2£;[£4e?4'4)s4]x (1 + 0(1)), 


and trivially E[£tepik^t] = E[£fstlkh - E[£t5tlkh = E[£!sjkh = E[£!lkh x E[5t]. 
Therefore |£;[£|(0)4^)(0)] - E[£flkh\< K\E[£?lkh\ x ||0- 0O|1 x (1 + o(l)). Now use 
9n 9^ by Theorem 2.1 and inf„>ArP[£|/^^ 4 ] > 0 for some A > 1 to deduce A[£|(0„) x 
lkJ{9n)]/E[£flkh ^ 1. This proves l/nj:ti£k^n)lkhSn)/E[£flkh 4 1. □ 
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Supplementary Material 

Supplement to “Robust estimation and inference for heavy tailed GARCH” 

(DOI: 10.3150/ 14-BEJ616SUPP; .pdf). We prove Lemmas A.l, A.3, A.4 and A.6, and 
Lemmas B.l and B.2. Assume all functions satisfy Pollard’s [54] permissibility criteria, the 
measure space that governs all random variables in this paper is complete, and therefore 
all majorants are measurable. Cf. Dudley [19]. Probability statements are therefore with 
respect to outer probability, and expectations over majorants are outer expectations. 
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